Our Definition of Tukey’s Test Statistic
Suppose $ r $ independent observations denoted: $${ Y_1 }, \cdots ,{ Y_r } \mathop \sim\limits^{ iid } N\left( { \mu ,{ \sigma ^2 } } \right)$$
Let $$W = range(Y) = max(Y_{ i })-min(Y_{ i })$$
Now, suppose that we have an estimate $ s^2 $ of the variance $ \sigma^2 $ which is based on $ ν $ degrees of freedom and is independent of the $Y_i $ \(\left(i = 1,\cdots,r \right)\)
. $ v $ is usually derived from analysis of variance.
Then, the Tukey’s Test Statistic is:
$$q_{ r,v } = \frac{ W }{ s }$$
Ranges: Internalized
Let $ W=X_{ n }-X_{ 1 } $
Internally Studentized Range: Population $ \sigma^2 $ is unknown
$$q_{ n,n-1 } = \frac{ W }{ s } = \frac{ X_{ n }-X_{ 1 } }{ s },$$
where $ s = \left( \frac{ 1 }{ \left( { n - 1 } \right) }{ \sum\limits_{ i = 1 }^n { { { \left( { { X_i } - \bar X } \right) }^2 } } } \right)^{ 1/2 } $
Ranges: Externalized
Let $ W=X_{ n }-X_{ 1 } $
Externally Studentized Range: Population $ \sigma^2 $ is unknown AND an independent estimator $ s_{ v }^2 $ of $ \sigma^2 $ is available with degrees of freedom $ v $.
$$T=\frac{ W }{ s_v }=\frac{ X_{ n }-X_{ 1 } }{ s_v }$$
The dependence of the distribution $ W $ on unknown $ \sigma $ can be removed by studentization. So, $ S_{ v }^{ 2 } $ is changable to $ vS_{ v }^{ 2 }/\sigma^2 \sim \chi_{ v }^2 $, independent of $ W $.
Ranges: Both
Let $ W=X_{ n }-X_{ 1 } $
Externally and Internally Studentized Range:
$$T=\frac{ W }{ \tilde{ s } }=\frac{ X_{ n }-X_{ 1 } }{ \tilde{ s } },$$
where $ \tilde{ s } = \left( \frac{ 1 }{ \left( { n - 1 + v } \right) }{ \sum\limits_{ i = 1 }^n { { { \left( { { X_i } - \bar X } \right) }^2 } } } +vs_{ v }^2 \right)^{ 1/2 } $
Our use
We will be using the Externally Studentized Range….
Specifically, our definition resembles….
Tukey’s Studentized Range Test Statistic ANOVA
Let \({ Y_{ ij } } \mathop \sim\limits^{ iid } N\left( { 0 ,{ \sigma ^2 } } \right)\)
, where $j=1,\cdots,n $ and $ i=1,\cdots,k$ be independent observations in a balanced one-way ANOVA with \(k\)
treatments.
Then \(\bar{Y}_1, \cdots , \bar{Y}_k\)
are the sample averages and \(S^2\)
is the independent and unbiased estimator of \(\sigma^2\)
based on \(v = k\left( { n-1 } \right)\)
.
Let $ W $ be the range of $ \bar{ Y }_i $.
$$q_{ n,v } = \frac{ W }{ S/\sqrt{ n } } = \frac{ max\left({ \bar{ Y_{ i\cdot } } }\right)-min\left({ \bar{ Y_{ i\cdot } } }\right) }{ \sqrt{ \left({ MS_{ error }/n }\right) } }$$
Example - Data
Consider a dosage regiment with repeated measures:
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | Row Total | |
---|---|---|---|---|---|---|
Dose 1 | 27.0 | 26.2 | 28.8 | 33.5 | 28.8 | 144.3 |
Dose 2 | 22.8 | 23.1 | 27.7 | 27.6 | 24.0 | 125.2 |
Dose 3 | 21.9 | 23.4 | 20.1 | 27.8 | 19.3 | 112.5 |
Dose 4 | 23.5 | 19.6 | 23.7 | 20.8 | 23.9 | 111.5 |
Col Total | 95.2 | 92.3 | 100.3 | 109.7 | 96.0 | 493.5 |
In R
(dose_type = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5)))
## [1] "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "3" "3" "3" "3" "3" "4" "4"
## [18] "4" "4" "4"
samples = c(data[1,], data[2,], data[3,], data[4,])
## [1] 27.0 26.2 28.8 33.5 28.8 22.8 23.1 27.7 27.6 24.0 21.9 23.4 20.1 27.8
## [15] 19.3 23.5 19.6 23.7 20.8 23.9
ANOVA Table:
model = glm(samples~factor(dose_type))
aov(model)
## Call:
## aov(formula = model)
##
## Terms:
## factor(dose_type) Residuals
## Sum of Squares 140.0935 116.3240
## Deg. of Freedom 3 16
##
## Residual standard error: 2.69634
## Estimated effects may be unbalanced
Tukey’s 95% C.I. pairwise comparison process
All pairwise differences $ \mu_{ i }-\mu_{ j } $ are given by $ 100 \times \left({ 1-\alpha } \right) $% C.I.
$$\left( { { { \bar Y }i } - { { \bar Y }j } } \right) \pm \frac{ { { q{ \alpha ;J,N - J } } } }{ { \sqrt 2 } } \cdot { s{ pooled } } \cdot \sqrt { \frac{ 1 }{ { { n_1 } } } + \frac{ 1 }{ { { n_2 } } } } ,{ \text{ } }{ s_{ pooled } } = \sqrt { MSW } $$
Note: Always start with the largest mean and smallest mean pair, if the result is not significant, then the result will hold for all means between the largest and smallest.
Largest vs. Smallest
$$\left( { 28.86 - 22.30 } \right) \pm \frac{ { { q_{ .05 ;4,20 - 4 } } } }{ { \sqrt 2 } } \cdot \sqrt{ \left(116.3/16\right) } \cdot \sqrt { \frac{ 1 }{ { { 5 } } } + \frac{ 1 }{ { { 5 } } } } , $$
$q_{ .05 ;4,20 - 4 } = qtukey(0.95,4,20-4) = $ 4.046093
\(6.56 \pm 4.8788568\)
In R:
TukeyHSD(aov(model))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = model)
##
## $`factor(dose_type)`
## diff lwr upr p adj
## 2-1 -3.82 -8.698941 1.058941 0.1546383
## 3-1 -6.36 -11.238941 -1.481059 0.0088837
## 4-1 -6.56 -11.438941 -1.681059 0.0069985
## 3-2 -2.54 -7.418941 2.338941 0.4661560
## 4-2 -2.74 -7.618941 2.138941 0.4026925
## 4-3 -0.20 -5.078941 4.678941 0.9993976
So, only 3-1 and 4-1 are significant.
That’s All folks!
That just about covers the Tukey Test Statistic…. The only bits left are an alternate formulation and the references used to construct this post.
Tukey’s Test Statistic Rewritten
Let \({ X_1 }, \cdots ,{ X_m } \mathop \sim\limits^{ iid } N\left( { 0 ,{ \sigma ^2 } } \right)\)
, where $ n \ge 2 $, and let $ Z $ be $ \chi^{ 2 } $ with $ n $ degrees of freedom.
$$ q = \frac{ { \mathop { \max }\limits_{ 1 \leqslant i,j \leqslant m } \left| { { X_i } - { X_j } } \right| } }{ { \sqrt { Z/n } } } = \frac{ { { X_{ m:m } } - { X_{ 1:m } } } }{ { \sqrt { Z/n } } }$$
In the case of $ m=2 $, $ q $ closely resembles the two sample $ t $ test statistic.
That is, $ X_{ 1 } $ and $ X_{ 2 } $ are are taken to be the standardized sample means of the two samples and $ Z/n $ is the pooled sample variance, $ S_p^2 = \frac{ { S_1^2\left( { { n_1 } - 1 } \right) + S_2^2\left( { { n_2 } - 1 } \right) } }{ { { n_1 } + { n_2 } - 2 } },{ \text{ } }{ S_E } = { S_p }\sqrt { \frac{ 1 }{ { { n_1 } } } + \frac{ 1 }{ { { n_2 } } } } $
References
-
David, H. A.. “Studentized Range.” Encyclopedia of statistical sciences. New York: Wiley, 2006. 1-3. Print.
-
Falk, Michael, and Frank Marohn. “The One-Way Analysis of Variance.” Foundations of statistical analyses and applications with SAS. Basel: Birkhauser Verlag, 2002. 193-194. Print.
-
Harter, H. Leon, and N. Balakrishnan. “The Studentized Range of Samples from a Normal Population.” Tables for the use of range and studentized range in tests of hypotheses. Boca Raton, Fla.: CRC Press, 1998. 52-53. Print.
-
Hochberg, Yosef. “Studentized Range.” Encyclopedia of Biostatistics. Hoboken, NJ: John Wiley & Sons, Ltd, 2005. 1-3. Print.
-
Lowry, Richard. “Ch 14: One-Way Analysis of Variance for Independent Samples.” Concepts & Applications of Inferential Statistics. 2000. 1-3. Print.