STATISTICS ASSIGNMENT NUMBER ONE **(DUE JANUARY 7, 2002)**

** **

Q1. Calculate the mean for the following series of numbers (2 marks)

15, 20, 34, 12, 28

Answer: __15 + 20 + 34 + 12 + 28 __ = 21.8

** 5**

Q2a. Calculate the median for the following series of numbers (4 marks)

34, 1, 25, 18, 5, 10, 12

Answer: Arrange the numbers from smallest to the biggest.
Then find the middle number. **The median
is 12** (Middle number of series

1, 5, 10, 12, 18, 25, 34 is 12).

Q2b. Calculate the median for the following series of numbers (4 marks)

18, 1, 11, 5, 9, 15

Answer: Arrange the numbers from smallest to the biggest. Then find the middle number. The median is 10 (Middle number of series

1, 5, 9, 11, 15, 18
is __9 + 11__ = 10

** 2**

Q3. Gender

Male Female

Present 5 10

Physical Trait X

Absent 15 20

A c2 (chi-square) test was carried out on the above data and the following results were obtained:

c2 (chi-square) = 0.397

Critical value = 3.841

a = 0.05

p > 0.05

Q3a. For the above statistical test, write down the null hypothesis

and the research hypothesis (4 marks)

**H _{0
}(null hypothesis) : there is no association between gender and**

** occurrence of the physical
trait. Any association seen is due to**

** chance. _{ }**

_{ }

_{ }H_{ 1} (research
hypothesis) : there is a statistically significant

** association between gender and
occurrence of the physical trait.**

Q3b. What is your decision with respect to the null hypothesis and

the research hypothesis? (2 marks)

Answer: **Accept the null hypothesis**

Q3c. Give ** TWO**
reasons for your decision (4 marks)

**p > 0.05 **

i.e. the
probability that the calculated c^{2} is due to chance

is too high (more than 0.05 or more than 5%) to be considered

acceptable/statistically significant

**The calculated chi-square (0.397) does not
equal or exceed the**

** critical value (3.841)**

Q4. The standard deviation for 20 student scores in an exam is 5.8 marks

(mean score is 65) while the standard deviation for a second group of 25

students for the same exam is 9.5 marks (mean score is 52) marks.

(a) What can you conclude about the spread of the scores for Group 1 students versus Group 2 students (4 marks)

**Scores for Group 2 students are more spread out** (because the
standard deviation for Group 2 scores is higher)

(b) What can you conclude if both sets of scores are normally distributed?

(6 marks)

From our knowledge of the Normal Curve,

**For Group 1, 68.3 % of all scores lie
between 59.2 to 70.8**

** marks**

(59.2 marks = mean – 1 standard deviation = 65 – 5.8

70.8 marks = mean + 1 standard deviation = 65 + 5.8)

**For Group 2, 68.3 % of all scores lie
between 42.5 to 61.5 **

** marks**

(42.5 marks = mean – 1 standard deviation = 52 – 9.5

61.5 marks = mean + 1 standard deviation = 52 + 9.5)

For Group 1, 95.5 % of all scores lie between 53.4 to 76.6

marks

(53.4 marks = mean – 2 standard deviations = 65 – 2(5.8) = 53.4

76.6 marks = mean + 2 standard deviations = 65 + 2(5.8) = 76.6)

**For Group 2, 95.5 % of all scores lie
between 33 to 71**

** marks**

(33 marks = mean – 2 standard deviations = 52 – 2(9.5) = 33

71 marks = mean + 2 standard deviations = 52 + 2(9.5) = 71)

**For Group 1, 99.7 % of all scores lie
between 47.6 to 82.4**

** marks**

** For Group 2, 99.7 % of all
scores lie between 23.5 to 80.5**

** marks**

** **

Q5. A t-test was carried out on the above data and the following results were

obtained:

Calculated t = 5.365

Critical value = 1.96

a = 0.05

p < 0.01

Q5a. For the above statistical test, write down the null hypothesis

and the research hypothesis (4 marks)

**H _{0
}(null hypothesis) : there is no difference between the two**

** underlying population means. Any
difference seen is due to**

** chance. _{ }**

_{ }

In other words, if all students from the underlying Group 1 population (from which the sample of 20 are drawn) and from the underlying Group 2 population (from which the sample of 25 are drawn) are given the exam, the mean score for Group 1 population will be the same as the mean score for Group 2 population

** **

_{ }H_{ 1} (research
hypothesis) : there is a statistically significant

**difference between the two underlying
population means.**

** **

Q5b. What is your decision with respect to the null hypothesis and

the research hypothesis? (2 marks)

Answer: **Accept the research hypothesis**

Q5c. Give ** TWO**
reasons for your decision (4 marks)

**p <
0.01** i.e. the probability that the calculated t is due to chance is
much less than 0.05 (much less than 5%). The probability is less than 1%.
Therefore, the difference in means is statistically significant

**The calculated t (5.365) exceeds the
critical value (1.96)**

** **

Q6. Abstract from an article published in the American Journal of Epidemiology 2001 Nov 1; 154(9):803-808

“Environmental Tobacco Smoke Exposure and Overtime Work as Risk Factors for Sick Building Syndrome in Japan” by Mizoue T, Reijula K, Andersson K

Sick building syndrome (SBS) is an increasingly common health problem for workers in modern office buildings. It is characterized by irritation of mucous membranes and the skin and general malaise. The impact of environmental tobacco smoke (ETS) exposure and overtime work on these symptoms remains unclear. The authors examined these relations using data from a 1998 cross-sectional survey of 1,281 municipal employees who worked in a variety of buildings in a Japanese city…... Among nonsmokers, the odds ratio for the association between study-defined SBS and 4 hours of ETS exposure per day was 2.7 (95% confidence interval: 1.6, 4.8) …..

Q6a. Name the ** TWO**
risk factors in the above study (2 marks)

**Environmental tobacco smoke (ETS) and
overtime work **

Q6b. Is the odds ratio statistically significant? Explain how you arrived

at your answer. (4 marks)

**Yes, the 95% Confidence Interval for the
odds ratio does not **

** contain the value 1**

Since the 95% Confidence Interval for the true odds ratio does not contain 1, we can be 95% sure that the true odds ratio is not equal to 1.

Therefore, we reject the null hypothesis (Odds ratio = 1) and accept the research hypothesis (Odds ratio is not equal to 1).

Q6c. What is your conclusion about the association between non-smoker

exposure to 4 hours of ETS per day and SBS for the above study?

(4 marks)

**There is a statistically significant
association between non-smoker**

** exposure to 4 hours of ETS per day
and Sick Building Syndrome **

** For this study of non-smokers,
SBS victims are 2.7 times more**

** likely to have been exposed to 4
hours of ETS per day**

than those who don’t suffer from SBS.