Tukey Studentized Range and You!

Our Definition of Tukey’s Test Statistic

Suppose $ r $ independent observations denoted: $${ Y_1 }, \cdots ,{ Y_r } \mathop \sim\limits^{ iid } N\left( { \mu ,{ \sigma ^2 } } \right)$$ Let $$W = range(Y) = max(Y_{ i })-min(Y_{ i })$$

Now, suppose that we have an estimate $ s^2 $ of the variance $ \sigma^2 $ which is based on $ ν $ degrees of freedom and is independent of the $Y_i $ \(\left(i = 1,\cdots,r \right)\). $ v $ is usually derived from analysis of variance.

Then, the Tukey’s Test Statistic is:

$$q_{ r,v } = \frac{ W }{ s }$$

Ranges: Internalized

Let $ W=X_{ n }-X_{ 1 } $

Internally Studentized Range: Population $ \sigma^2 $ is unknown

$$q_{ n,n-1 } = \frac{ W }{ s } = \frac{ X_{ n }-X_{ 1 } }{ s },$$

where $ s = \left( \frac{ 1 }{ \left( { n - 1 } \right) }{ \sum\limits_{ i = 1 }^n { { { \left( { { X_i } - \bar X } \right) }^2 } } } \right)^{ 1/2 } $

Ranges: Externalized

Let $ W=X_{ n }-X_{ 1 } $

Externally Studentized Range: Population $ \sigma^2 $ is unknown AND an independent estimator $ s_{ v }^2 $ of $ \sigma^2 $ is available with degrees of freedom $ v $.

$$T=\frac{ W }{ s_v }=\frac{ X_{ n }-X_{ 1 } }{ s_v }$$

The dependence of the distribution $ W $ on unknown $ \sigma $ can be removed by studentization. So, $ S_{ v }^{ 2 } $ is changable to $ vS_{ v }^{ 2 }/\sigma^2 \sim \chi_{ v }^2 $, independent of $ W $.

Ranges: Both

Let $ W=X_{ n }-X_{ 1 } $

Externally and Internally Studentized Range:

$$T=\frac{ W }{ \tilde{ s } }=\frac{ X_{ n }-X_{ 1 } }{ \tilde{ s } },$$

where $ \tilde{ s } = \left( \frac{ 1 }{ \left( { n - 1 + v } \right) }{ \sum\limits_{ i = 1 }^n { { { \left( { { X_i } - \bar X } \right) }^2 } } } +vs_{ v }^2 \right)^{ 1/2 } $

Our use

We will be using the Externally Studentized Range….

Specifically, our definition resembles….

Tukey’s Studentized Range Test Statistic ANOVA

Let \({ Y_{ ij } } \mathop \sim\limits^{ iid } N\left( { 0 ,{ \sigma ^2 } } \right)\), where $j=1,\cdots,n $ and $ i=1,\cdots,k$ be independent observations in a balanced one-way ANOVA with \(k\) treatments.

Then \(\bar{Y}_1, \cdots , \bar{Y}_k\) are the sample averages and \(S^2\) is the independent and unbiased estimator of \(\sigma^2\) based on \(v = k\left( { n-1 } \right)\).

Let $ W $ be the range of $ \bar{ Y }_i $.

$$q_{ n,v } = \frac{ W }{ S/\sqrt{ n } } = \frac{ max\left({ \bar{ Y_{ i\cdot } } }\right)-min\left({ \bar{ Y_{ i\cdot } } }\right) }{ \sqrt{ \left({ MS_{ error }/n }\right) } }$$

Example - Data

Consider a dosage regiment with repeated measures:

Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Row Total
Dose 1 27.0 26.2 28.8 33.5 28.8 144.3
Dose 2 22.8 23.1 27.7 27.6 24.0 125.2
Dose 3 21.9 23.4 20.1 27.8 19.3 112.5
Dose 4 23.5 19.6 23.7 20.8 23.9 111.5
Col Total 95.2 92.3 100.3 109.7 96.0 493.5

In R

(dose_type = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5)))
##  [1] "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "3" "3" "3" "3" "3" "4" "4"
## [18] "4" "4" "4"
samples = c(data[1,], data[2,], data[3,], data[4,])
##  [1] 27.0 26.2 28.8 33.5 28.8 22.8 23.1 27.7 27.6 24.0 21.9 23.4 20.1 27.8
## [15] 19.3 23.5 19.6 23.7 20.8 23.9

ANOVA Table:

model = glm(samples~factor(dose_type))
aov(model)
## Call:
##    aov(formula = model)
## 
## Terms:
##                 factor(dose_type) Residuals
## Sum of Squares           140.0935  116.3240
## Deg. of Freedom                 3        16
## 
## Residual standard error: 2.69634
## Estimated effects may be unbalanced

Tukey’s 95% C.I. pairwise comparison process

All pairwise differences $ \mu_{ i }-\mu_{ j } $ are given by $ 100 \times \left({ 1-\alpha } \right) $% C.I.

$$\left( { { { \bar Y }i } - { { \bar Y }j } } \right) \pm \frac{ { { q{ \alpha ;J,N - J } } } }{ { \sqrt 2 } } \cdot { s{ pooled } } \cdot \sqrt { \frac{ 1 }{ { { n_1 } } } + \frac{ 1 }{ { { n_2 } } } } ,{ \text{ } }{ s_{ pooled } } = \sqrt { MSW } $$

Note: Always start with the largest mean and smallest mean pair, if the result is not significant, then the result will hold for all means between the largest and smallest.

Largest vs. Smallest

$$\left( { 28.86 - 22.30 } \right) \pm \frac{ { { q_{ .05 ;4,20 - 4 } } } }{ { \sqrt 2 } } \cdot \sqrt{ \left(116.3/16\right) } \cdot \sqrt { \frac{ 1 }{ { { 5 } } } + \frac{ 1 }{ { { 5 } } } } , $$

$q_{ .05 ;4,20 - 4 } = qtukey(0.95,4,20-4) = $ 4.046093

\(6.56 \pm 4.8788568\)

In R:

TukeyHSD(aov(model))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = model)
## 
## $`factor(dose_type)`
##      diff        lwr       upr     p adj
## 2-1 -3.82  -8.698941  1.058941 0.1546383
## 3-1 -6.36 -11.238941 -1.481059 0.0088837
## 4-1 -6.56 -11.438941 -1.681059 0.0069985
## 3-2 -2.54  -7.418941  2.338941 0.4661560
## 4-2 -2.74  -7.618941  2.138941 0.4026925
## 4-3 -0.20  -5.078941  4.678941 0.9993976

So, only 3-1 and 4-1 are significant.

That’s All folks!

That just about covers the Tukey Test Statistic…. The only bits left are an alternate formulation and the references used to construct this post.

Tukey’s Test Statistic Rewritten

Let \({ X_1 }, \cdots ,{ X_m } \mathop \sim\limits^{ iid } N\left( { 0 ,{ \sigma ^2 } } \right)\), where $ n \ge 2 $, and let $ Z $ be $ \chi^{ 2 } $ with $ n $ degrees of freedom.

$$ q = \frac{ { \mathop { \max }\limits_{ 1 \leqslant i,j \leqslant m } \left| { { X_i } - { X_j } } \right| } }{ { \sqrt { Z/n } } } = \frac{ { { X_{ m:m } } - { X_{ 1:m } } } }{ { \sqrt { Z/n } } }$$

In the case of $ m=2 $, $ q $ closely resembles the two sample $ t $ test statistic.

That is, $ X_{ 1 } $ and $ X_{ 2 } $ are are taken to be the standardized sample means of the two samples and $ Z/n $ is the pooled sample variance, $ S_p^2 = \frac{ { S_1^2\left( { { n_1 } - 1 } \right) + S_2^2\left( { { n_2 } - 1 } \right) } }{ { { n_1 } + { n_2 } - 2 } },{ \text{ } }{ S_E } = { S_p }\sqrt { \frac{ 1 }{ { { n_1 } } } + \frac{ 1 }{ { { n_2 } } } } $

References

  • David, H. A.. “Studentized Range.” Encyclopedia of statistical sciences. New York: Wiley, 2006. 1-3. Print.

  • Falk, Michael, and Frank Marohn. “The One-Way Analysis of Variance.” Foundations of statistical analyses and applications with SAS. Basel: Birkhauser Verlag, 2002. 193-194. Print.

  • Harter, H. Leon, and N. Balakrishnan. “The Studentized Range of Samples from a Normal Population.” Tables for the use of range and studentized range in tests of hypotheses. Boca Raton, Fla.: CRC Press, 1998. 52-53. Print.

  • Hochberg, Yosef. “Studentized Range.” Encyclopedia of Biostatistics. Hoboken, NJ: John Wiley & Sons, Ltd, 2005. 1-3. Print.

  • Lowry, Richard. “Ch 14: One-Way Analysis of Variance for Independent Samples.” Concepts & Applications of Inferential Statistics. 2000. 1-3. Print.

comments powered by Disqus