Introduction
Experimental design is a fundamental aspect of statistical analysis,
allowing researchers to systematically investigate the effects of
different treatments on a response variable. In CRD, experimental units
are randomly assigned to different treatment groups.

In this note, we will explore how Classical Analysis of Variance
(ANOVA) and Regression Analysis can be used to analyze data from a CRD.
We will also illustrate these methods using some numerical examples.
For
Working Data in Plant Physiology
Our example data are from an experiment in plant physiology,
published by Sokal and Rohlf (1995). The lengths of pea sections (the
dependent, or response, variable) grown in a tissue culture were
recorded. The purpose of the experiment was to test the effects of
various sugar media (the independent, or explanatory, variable) on mean
pea section length. A balanced CRD was used with 10
replicates per treatment level.
The data are given in the table below:

The wide-format table is easy to use when performing manual
calculations. Most of the software programs use long-format data tables.
We will reshape this wide table when using the R program for various
analyses.
Objective and Logic of
Analyzing CRD Data
Before moving to data analysis, we first discuss the objective,
logic, and methods for analyzing CRD data. The
The objective and
Logic
The core goals when analyzing Completely Randomized Design (CRD) data
are threefold:
Treatment Effect Detection - Determine if
different treatments produce statistically significant differences in
the response variable. For example, do fertilizers A, B, and C result in
different crop yields?
Effect Size Quantification - Measure the
magnitude of treatment differences. For example, Fertilizer C increases
yield by 13 bushels/acre compared to Fertilizer A
Ranking/Comparison of Treatments Identify which
treatments perform best/worst. This involves multiple comparisons among
means across treatments. For example, we can order fertilizers by yield:
A > B > A through multiple comparisons.
Statistically, treatment Effect Detection in CRD
with \(k\) treatments can be achieved
by testing a hypothesis.
\[
H_0: \mu_1 = \mu_2 = \cdots = \mu_k \ \ \text{vs} \ \ H_a: \ \text{at
least one mean differs}
\]
If \(H_0\) is
rejected, we quantify the size of treatment effects and
rank the treatment effects across the treatments in the CRD. There are
different approaches to testing the above hypothesis
Measuring Discrepancy
between \(H_0\) and \(H_a\)
Let’s use the following balanced CRD data table and related
notations.

The three different errors defined above have the following
relationship.
\[
(y_{ij} - \bar{y}_{\cdot\cdot}) = (y_{ij} - \bar{y}_{i\cdot}) +
(\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})
\]
The above expression simply means that we can
decompose the overall residual (\(y_{ij} - \bar{y}_{\cdot\cdot}\)) into
within treatment error (\((y_{ij} -
\bar{y}_{i\cdot})\), also called within residual error) and
between treatment error (\(\bar{y}_{i\cdot} -
\bar{y}_{\cdot\cdot}\)). Since an error could be positive or
negative. Similar to the definition of variance, we square all these
errors to obtain squared errors.
After some tedous
algebra (i.e., not straightforward at all!), we have the
following equation
\[
\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{\cdot\cdot})^2=
\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{i\cdot})^2 +
\sum_{i=1}^t\sum_{j=1}^{n_i}(\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2
\]
The last term in the above equation has no subscript \(j\); we simplify the last term to get the
following equation.
\[
\underbrace{\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} -
\bar{y}_{\cdot\cdot})^2}_{SS_T} =\underbrace{
\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{i\cdot})^2}_{SS_W} +
\underbrace{\sum_{i=1}^t n_i(\bar{y}_{i\cdot} -
\bar{y}_{\cdot\cdot})^2}_{SS_B}
\] where \(SS_T\) =
Total Sum of Squared Residuals, \(SS_W\) = Sum of Squared Errors
within treatments and \(SS_B\)
= Sum of Squared Errors between treatments.
Recall the definition of sample
variance
\[
s^2 = \frac{(x_1-\bar{x})^2 +(x_2-\bar{x})^2 + \cdots _
(x_n-\bar{x})^2}{n-1} = \frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1},
\]
Which is average squared
deviation (also called mean squared deviation
MSD or mean squared error, MSE)! The denominator \(n -1\) in the above definition reflects the
degrees of freedom - meaning that number of independent
observations (\(n\)) minus
one constratint (\(\bar{x} =
\sum_{i=1}^n x_i/n\), each equation based on the \(n\) independent observatiob is considered
as a contraint!).
Let’s look at \(SS_B\): There are
\(t\) independent means (one for each
treatment) \(\{\bar{y}_{1\cdot},
\bar{y}_{2\cdot}, \bar{y}_{3\cdot}, \cdots, \bar{y}_{t\cdot}
\}\), the grand total \(\bar{y}_{\cdot\cdot}\) is consider a
constraint. Therefore, there are $ t-1$ degrees of freedom associated
with \(SS_B\). Therefore, the mean
squared error of the between errors is defined by
\[
MS_B = \frac{SS_B}{t-1}
\]
Similarly, in \(SS_W\), there \(n_T\) independent observations \(y_{ij}\) and \(t\) constraints \(\{\bar{y}_{1\cdot}, \bar{y}_{2\cdot},
\bar{y}_{3\cdot}, \cdots, \bar{y}_{t\cdot} \}\)
(Yes! since each of them is the mean of
observations within each treatment). Therefore, the
degrees of freedom in \(SS_W\) is \(n_T-t\), Therefore, the mean square error
of within errors is defined by
\[
MS_W = \frac{SS_W}{n_T - t}.
\]
Remarks:
Here is the logic for how to
define a measure to assess the discrepancy between \(H_0\) and \(H_a\):
If there is no real difference
between treatments (i.e., \(H_0\) is
true), \(MS_B\) and \(MS_W\) both estimate the same underlying
variance. Therefore \(MS_B \approx
MS_W\)!
We can construct the test statistic based on \(MS_B\) and \(MSE_B\) under \(H_0\). Mathematically, both \(MS_B-MS_W\) (\(\approx 0\)) and \(MS_B/MS_W\) (\(\approx 1\)) are valid measures to assess
the discepancy between \(H_0\) and
\(H_a\). Since both quantities are
random (since they are evaluated based on the random sample), we need to
choose the one with a known probability distribution.
The ratio expression \(MS_B/MS_W\) has
a well-known distribution - F distribution!
Basics of F
Distribution
We have discussed several distributions to characterize test
statistics for the z-test, t-test, and chi-square test. One of the
essential tools in hypothesis testing involving the comparison of
variances (the case we introduced above) is the F-distribution. This
subsection outlines the basics of the F-distribution and demonstrates
how to use R to compute critical values and p-values associated with
F-tests.
The F-distribution is a continuous probability distribution used for
the comparison of two sample variances (such as \(SS_B\) and \(SS_W\)) and regression analysis (will
discuss this later). Using the above quantities, we have
\[
F = \frac{SS_W/(n_T-t)}{SS_B/(t-1)} \rightarrow F_{n_T-t, \ t-1}
\]
\(n_T-t\) is the degrees of freedom
of the numerator and \(t-1\) is the degrees of the
denominator. We can also flip the ratio to get another
valid test statistic
\[
F^\prime = \frac{SS_B/(t-1)}{SS_W/(n_T-t)} \rightarrow F_{\ t-1, n_T-t}
\]
Caution: the degrees of freedom
in the subscript MUST be adjusted so that the first index is the degrees
of freedom of the numerator and the second subscript represents the
degrees of freedom of the denominator.
This means \(F_{7,5}\) and \(F_{5, 7}\) are two different F
distributions!
\(F_{df_N, df_D}\) is skewed to the
right. The following figure shows the density curve of an F-distribution
with degrees of freedom df1 = 7 (numerator) and df2 = 10 (denominator),
together with the R commands to find the critical and p-values.

The syntax of the two R functions used to find the p-value and
critical value is annotated in the following figure.

Most introductory statistics textbooks include an F table based on a
few commonly used significance levels \(\alpha\). For example, the critical value
of \(F_{7, 10}\) at \(\alpha = 0.05\) labeled in the left panel
of the density curve of \(F_{7, 10}\)
can also be found from the following F table

The F tables are constructed based on a few significance levels. It
is not to use use F table to find p-values.
Next, we open a standalone section to formally introduce the one-way
ANOVA for CRD data.
One-Way ANOVA and
Implementation
We have discussed the logic and valid measures for assessing the
discrepancy between \(H_0\) and \(H_a\). Next, we summarize the above results
in a classical one-way ANOVA table and test the following null
hypothesis based on the CRD.
\[
H_0: \mu_1 = \mu_2 = \cdots = \mu_t \ \ \text{vs} \ \ H_a: \text{at
least one mean differs}
\]
The test statistic and corresponding p-value are summarized in the
following ANOVA table.
Between |
\(t-1\) |
\(SS_B\) |
\(MS_B=SS_B/(t-1)\) |
\(MS_W/MS_B\) |
|
Within |
\(n_T - t\) |
\(SS_W\) |
\(MS_W =
SS_W/(n_T-t)\) |
|
|
Total |
\(n_T-`\) |
\(SS_T\) |
|
|
|
Decision Rule: If the p-value in the above table is
less than the significance level \(\alpha\), we reject the null hypothesis;
otherwise, we conclude the null hypothesis.
Next, we use the plant physiology data to
demonstrate to to construct the above ANOVA table and use R as a
calculator to find \(SS\) and \(MS\) and related quantities in the ANOVA
table. The following code loads the data into R in the same format
(i.e., wide table format)
# Create the data vectors
Control <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68) # Control
Glucose <- c(57, 58, 59, 59, 62, 60, 60, 57, 59, 61) # Glucose
Fructose <- c(58, 61, 56, 58, 57, 56, 58, 60, 57, 58) # Fructose
GlucFruc <- c(58, 59, 58, 61, 57, 56, 58, 57, 57, 59) # GlucFruc
Sucrose <- c(62, 66, 65, 63, 64, 62, 65, 65, 62, 67) # Sucrose
## using cbind() to display the data
cbind(Control, Glucose, Fructose, GlucFruc, Sucrose)
| Control Glucose Fructose GlucFruc Sucrose
| [1,] 75 57 58 58 62
| [2,] 67 58 61 59 66
| [3,] 70 59 56 58 65
| [4,] 75 59 58 61 63
| [5,] 65 62 57 57 64
| [6,] 71 60 56 56 62
| [7,] 67 60 58 58 65
| [8,] 67 57 60 57 65
| [9,] 76 59 57 57 62
| [10,] 68 61 58 59 67
#colnames(wide.table) <- c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose")
The next code chunk calculates different quantities in the ANOVA
table. I will make comments in the code to show detailed steps. The
primary R commands are sum()
, mean()
, and
related R functions of the F-distribution.
## degrees of freedom
t = 5
n.T = 10*5
dfN = t - 1
dfD = n.T - t
ybar.. = mean(c(Control, Glucose, Fructose, GlucFruc, Sucrose))
## Sum of squares
SS.B = 10*sum(c((mean(Control)-ybar..)^2, (mean(Glucose)-ybar..)^2, (mean(Fructose)-ybar..)^2, (mean(GlucFruc)-ybar..)^2, (mean(Sucrose)-ybar..)))
SS.W = sum(c(sum((Control-mean(Control))^2), sum((Glucose-mean(Glucose))^2), sum((Fructose-mean(Fructose))^2), sum((GlucFruc-mean(GlucFruc))^2), sum((Sucrose-mean(Sucrose))^2)))
SS.T = sum((c(Control, Glucose, Fructose, GlucFruc, Sucrose) - ybar..)^2)
## MS
MS.B = SS.B/dfN
MS.W = SS.W/dfD
## F value
F.value = MS.B/MS.W
p.val = pf(F.value, dfN, dfD, lower.tail = FALSE)
anova.out <- data.frame(
DF = c(dfN, dfD),
SS = c(SS.B, SS.W),
MS = c(MS.B, MS.W),
F.value = c(F.value, NA),
p.value = c(p.val, NA)
)
rownames(anova.out)=c("Between", "Within")
print(anova.out, na.print = "")
| DF SS MS F.value p.value
| Between 4 1077.944 269.486000 51.31981 3.324636e-16
| Within 45 236.300 5.251111 NA NA
Conclusion: The p-value is approximately 0. We
reject the null hypothesis that all means are equal. In other words, the
mean pea section length is affected by various sugar media.
The second part of the following YouTube video (after 15 Minutes, https://www.youtube.com/watch?v=KJ5G2KjcXcA) discussed
in the next note on One-way ANOVA.
ANOVA with R Function
aov()
We still use plant physiology data to illustrate
one-way ANOVA analysis using aov()
. Since
aov()
requires a long table structure. That is, all
measurements of the response must be stored in a column and a standalone
column to label the treatment of each value in the measurement column.
The following code creates a long table.
# Create the data vectors
lengths <- c(
75, 67, 70, 75, 65, 71, 67, 67, 76, 68, # Control
57, 58, 59, 59, 62, 60, 60, 57, 59, 61, # Glucose
58, 61, 56, 58, 57, 56, 58, 60, 57, 58, # Fructose
58, 59, 58, 61, 57, 56, 58, 57, 57, 59, # GlucFruc
62, 66, 65, 63, 64, 62, 65, 65, 62, 67 # Sucrose
)
# Create the treatment factor
treatment <- rep(c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose"), each = 10)
# Combine into a data frame
pea.long <- data.frame(
Length = lengths,
Treatment = factor(treatment, levels = c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose"))
)
# View the first 20 rows to make sure the data structure is correct
head(pea.long, n=20)
| Length Treatment
| 1 75 Control
| 2 67 Control
| 3 70 Control
| 4 75 Control
| 5 65 Control
| 6 71 Control
| 7 67 Control
| 8 67 Control
| 9 76 Control
| 10 68 Control
| 11 57 Glucose
| 12 58 Glucose
| 13 59 Glucose
| 14 59 Glucose
| 15 62 Glucose
| 16 60 Glucose
| 17 60 Glucose
| 18 57 Glucose
| 19 59 Glucose
| 20 61 Glucose
We next call aov()
to perform ANOVA analysis.
## create an anova model object
aov.model <- aov(Length ~ Treatment, data = pea.long)
## create the classical ANOVA table
anova(aov.model)
| Analysis of Variance Table
|
| Response: Length
| Df Sum Sq Mean Sq F value Pr(>F)
| Treatment 4 1105.7 276.430 52.642 < 2.2e-16 ***
| Residuals 45 236.3 5.251
| ---
| Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## or alternatively, you can simply use summary(aov.model) to produce the same results!
The annotated output is given by

Conclusion: Since the p-value is approximately 0,
the null hypothesis is rejected. That is, the various sugar media (the
independent, or explanatory, variable) significantly affect the mean pea
section length.
The following YouTube video ( https://www.youtube.com/watch?v=IkR3Rzrgiv4) gives
another example of using the R function aov()
. The video
also uses some graphical functions to create some visualizations. We did
not discuss visualizations in this class, but we will cover more
visualizations in subsequent statistics courses.
Multiple
Comparisons
When analyzing variance (ANOVA), a significant F-test (i.e., p-value
is less than the given significance level \(\alpha\)) indicates that at least one group
mean differs from the others. However, ANOVA does not specify which
pairs of groups are significantly different. Post-hoc
tests, also known as multiple comparison
tests, are used to identify these specific differences while
controlling for the increased risk of Type I errors (false positives)
that arise from performing multiple comparisons.
There are different Post-hoc tests. We only
introduce two commonly used post-hoc methods using R built-in
functions:
Tukey’s Honest Significant Difference (HSD)
-Controls family-wise error rate (FWER) for all
pairwise comparisons.
Bonferroni Correction - Adjusts p-values by
multiplying them by the number of comparisons
Visual Comparison
We have learned different graphical summaries of data. Box-plot is a
geometric representation of the 5-number-summary of a numerical
variable. Since treatment variable has multiple categories, we can
create a back-to-back box-plot to visualize the different in the
distributions.
The R function boxplot()
can draw multiple boxplots of a
numerical variable according the categories of the group variable. Since
package {graphics} comes with the base R, installation
and loading the package is unnecessary.
boxplot(Length ~ Treatment, data = pea.long)

The above box-plots clearly show that the pea lengths using sugar
media Fructose, Glucose, and GlucFruc are similar to each other.
Tukey’s Honest
Significant Difference (HSD)
Tukey’s Honest Significant Difference (HSD) is a post-hoc test used
after a statistically significant ANOVA to determine which specific
group means differ from each other. Unlike multiple t-tests, which
inflate the Type I error rate, Tukey’s HSD adjusts for multiple
comparisons, maintaining the family-wise error rate (FWER) at the
desired level (e.g., 0.05).
Next, we perform a multiple comparison of pea length across different
sugar media using the plant physiology data. For
convenience, we use the long-format data frame defined earlier and an
anova object in the following R code.
## refit the ANOVA model
aov.model <- aov(Length ~ Treatment, data = pea.long)
## multiple comparison
tukey.comp <- TukeyHSD(aov.model)
print(tukey.comp)
| Tukey multiple comparisons of means
| 95% family-wise confidence level
|
| Fit: aov(formula = Length ~ Treatment, data = pea.long)
|
| $Treatment
| diff lwr upr p adj
| Glucose-Control -10.9 -13.811928 -7.988072 0.0000000
| Fructose-Control -12.2 -15.111928 -9.288072 0.0000000
| GlucFruc-Control -12.1 -15.011928 -9.188072 0.0000000
| Sucrose-Control -6.0 -8.911928 -3.088072 0.0000050
| Fructose-Glucose -1.3 -4.211928 1.611928 0.7114379
| GlucFruc-Glucose -1.2 -4.111928 1.711928 0.7676225
| Sucrose-Glucose 4.9 1.988072 7.811928 0.0001778
| GlucFruc-Fructose 0.1 -2.811928 3.011928 0.9999786
| Sucrose-Fructose 6.2 3.288072 9.111928 0.0000026
| Sucrose-GlucFruc 6.1 3.188072 9.011928 0.0000036
The results indicate that there is no significant difference in pairs
Fructose-Glucose, GlucFruc-Glucose, and GlucFruc-Fructose.
par(mai=c(1.5,2,1,1)) # Makes room on the plot for the group names
plot(tukey.comp, # name of the TukeyHSD() object
cex.lab = 0.6, # adjust the font size of the labels of the vertical axis
las = 1) # orientation of the labels

Bonferroni
Comparison
The Bonferroni correction is a conservative method for adjusting
significance levels when performing multiple comparisons to reduce the
risk of Type I errors (false positives). It is one of the simplest and
most widely used approaches for controlling the family-wise error
rate.
The R function pairwise.t.test()
performs pairwise
comparison. It requires inputting individual variables (response and
treatment group). The following is code for the Bonferroni
procedure.
Bonfeeroni.result <- pairwise.t.test(
x = pea.long$Length,
g = pea.long$Treatment,
p.adjust.method = "bonferroni"
)
print(Bonfeeroni.result)
|
| Pairwise comparisons using t tests with pooled SD
|
| data: pea.long$Length and pea.long$Treatment
|
| Control Glucose Fructose GlucFruc
| Glucose 7.3e-13 - - -
| Fructose 1.7e-14 1.00000 - -
| GlucFruc 2.2e-14 1.00000 1.00000 -
| Sucrose 5.1e-06 0.00019 2.6e-06 3.7e-06
|
| P value adjustment method: bonferroni
Using the p-value adjusted Bonferroni procedure, three pairs of
treatment groups are significantly different: Fructose-Glucose,
GlucFruc-Glucose, and GlucFruc-Fructose. This result is consistent with
Tukeu’s HSD method.
Before conclude this section, we recommend a YouTube video (https://www.youtube.com/watch?v=hPUvqtmCu7Q) with an
example one multiple comparison among group means using R,
---
title: "Analysis of Variance for Completely Randomized Design"
author: "Cheng Peng"
date: "STA200 Statistics II "
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: show
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 5
    fig_height: 4
---

```{=html}

<style type="text/css">

div#TOC li {
    list-style:none;
    background-image:none;
    background-repeat:none;
    background-position:0;
}
h1.title {
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}
h4.author { /* Header 4 - and the author and data headers use this too  */
    font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}
h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 3 - and the author and data headers use this too  */
    font-size: 20px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}
h2 { /* Header 3 - and the author and data headers use this too  */
    font-size: 18px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

</style>
```



```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
# knitr::opts_knit$set(root.dir = "C:/Users/75CPENG/OneDrive - West Chester University of PA/Documents")
# knitr::opts_knit$set(root.dir = "C:\\STA490\\w05")

knitr::opts_chunk$set(echo = TRUE,       
                      warning = FALSE,   
                      result = TRUE,   
                      message = FALSE,
                      comment = "|")
```



\


# Introduction

Experimental design is a fundamental aspect of statistical analysis, allowing researchers to systematically investigate the effects of different treatments on a response variable. In CRD, experimental units are randomly assigned to different treatment groups.


```{r echo = FALSE, fig.align='center', out.width="70%"}
include_graphics("week04/CRD.png")
```

In this note, we will explore how Classical Analysis of Variance (ANOVA) and Regression Analysis can be used to analyze data from a CRD. We will also illustrate these methods using some numerical examples. For


**Working Data in Plant Physiology**

Our example data are from an experiment in plant physiology, published by Sokal and Rohlf (1995). The lengths of pea sections (the dependent, or response, variable) grown in a tissue culture were recorded. The purpose of the experiment was to test the effects of various sugar media (the independent, or explanatory, variable) on mean pea section length. A **balanced CRD** was used with 10 replicates per treatment level.

The data are given in the table below:


```{r echo = FALSE, fig.align='center', out.width="70%"}
include_graphics("week04/ANOVA-ExampleData.png")
```



The wide-format table is easy to use when performing manual calculations. Most of the software programs use long-format data tables. We will reshape this wide table when using the R program for various analyses.




# Objective and Logic of Analyzing CRD Data

Before moving to data analysis, we first discuss the objective, logic, and methods for analyzing CRD data. The 

## The objective and Logic

The core goals when analyzing Completely Randomized Design (CRD) data are threefold:

* **Treatment Effect Detection** - Determine if different treatments produce statistically significant differences in the response variable. For example, do fertilizers A, B, and C result in different crop yields?

* **Effect Size Quantification** - Measure the magnitude of treatment differences. For example, Fertilizer C increases yield by 13 bushels/acre compared to Fertilizer A

* **Ranking/Comparison of Treatments** Identify which treatments perform best/worst. This involves multiple comparisons among means across treatments. For example, we can order fertilizers by yield: A > B > A through multiple comparisons.


Statistically, **treatment Effect Detection** in CRD with $k$ treatments can be achieved by testing a hypothesis.

$$
H_0: \mu_1 = \mu_2 = \cdots = \mu_k \ \ \text{vs} \ \  H_a: \ \text{at least one mean differs}
$$

If $H_0$ is **rejected**, we quantify the size of treatment effects and rank the treatment effects across the treatments in the CRD. There are different approaches to testing the above hypothesis


## Measuring Discrepancy between $H_0$ and $H_a$

Let's use the following balanced CRD data table and related notations.


```{r echo = FALSE, fig.align='center', out.width="90%"}
include_graphics("week04/GenericCRD-DataTable.png")
```

The three different errors defined above have the following relationship.

$$
(y_{ij} - \bar{y}_{\cdot\cdot}) = (y_{ij} - \bar{y}_{i\cdot}) + (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot}) 
$$

The above expression simply means that we can **decompose** the overall residual ($y_{ij} - \bar{y}_{\cdot\cdot}$) into within treatment error ($(y_{ij} - \bar{y}_{i\cdot})$, also called within residual error) and between treatment error ($\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot}$). Since an error could be positive or negative. Similar to the definition of variance, we square all these errors to obtain squared errors. 

<font color = "red" size = 4>**\color{red}After some tedous algebra** (i.e., not straightforward at all!)</font>, we have the following equation

$$
\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{\cdot\cdot})^2= \sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{i\cdot})^2 + \sum_{i=1}^t\sum_{j=1}^{n_i}(\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 
$$

The last term in the above equation has no subscript $j$; we simplify the last term to get the following equation.

$$
\underbrace{\sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{\cdot\cdot})^2}_{SS_T}  =\underbrace{ \sum_{i=1}^t\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_{i\cdot})^2}_{SS_W}  + \underbrace{\sum_{i=1}^t n_i(\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2}_{SS_B}  
$$
where $SS_T$ = **Total Sum of Squared Residuals**, $SS_W$ = **Sum of Squared Errors within treatments** and $SS_B$ = **Sum of Squared Errors between treatments**.


<font color = "red" size = 5>**\color{red} Recall the definition of sample variance**</font>

$$
s^2 = \frac{(x_1-\bar{x})^2 +(x_2-\bar{x})^2 + \cdots _ (x_n-\bar{x})^2}{n-1} = \frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1},
$$

Which is <font color = "red">**\color{red}average squared deviation**</font> (also called **mean squared deviation MSD or mean squared error, MSE**)! The denominator $n -1$ in the above definition reflects the **degrees of freedom** - meaning that number of independent observations ($n$) minus **one** constratint ($\bar{x} = \sum_{i=1}^n x_i/n$, each equation based on the $n$ independent observatiob is considered as a contraint!).

Let's look at $SS_B$: There are $t$ independent means (one for each treatment) $\{\bar{y}_{1\cdot}, \bar{y}_{2\cdot}, \bar{y}_{3\cdot}, \cdots, \bar{y}_{t\cdot} \}$, the grand total $\bar{y}_{\cdot\cdot}$ is consider a constraint. Therefore, there are $ t-1$ degrees of freedom associated with $SS_B$. Therefore, the mean squared error of the between errors is defined by

$$
MS_B = \frac{SS_B}{t-1}
$$





Similarly, in $SS_W$, there $n_T$ independent observations $y_{ij}$ and $t$ constraints $\{\bar{y}_{1\cdot}, \bar{y}_{2\cdot}, \bar{y}_{3\cdot}, \cdots, \bar{y}_{t\cdot} \}$ (<font color = "red">**Yes! since each of them is the mean of observations within each treatment**)</font>. Therefore, the degrees of freedom in $SS_W$ is $n_T-t$, Therefore, the mean square error of within errors is defined by

$$
MS_W = \frac{SS_W}{n_T - t}.
$$

<font color = "red">**\color{red}Remarks**</font>:

* $MS_B$ is the estimated sample variance of the **sample means** of individual treatments.

* $MS_W$ is the weighted average of the variances of each individual treatment.




<font color = "red" size = 4>**\color{red}Here is the logic for how to define a measure to assess the discrepancy between $H_0$ and $H_a$:**</font>

<font color = "blue" size = 5>**\color{blue}If there is no real difference between treatments (i.e., $H_0$ is true), $MS_B$ and $MS_W$ both estimate the same underlying variance. Therefore $MS_B \approx MS_W$!**</font>

\

We can construct the test statistic based on $MS_B$ and $MSE_B$ under $H_0$. Mathematically, both $MS_B-MS_W$ ($\approx 0$) and $MS_B/MS_W$ ($\approx 1$) are valid measures to assess the discepancy between $H_0$ and $H_a$. Since both quantities are random (since they are evaluated based on the random sample), we need to choose the one with a known **probability distribution**. The ratio expression $MS_B/MS_W$ has a well-known distribution - **F distribution**!

\

## Basics of F Distribution

We have discussed several distributions to characterize test statistics for the z-test, t-test, and chi-square test. One of the essential tools in hypothesis testing involving the comparison of variances (the case we introduced above) is the F-distribution. This subsection outlines the basics of the F-distribution and demonstrates how to use R to compute critical values and p-values associated with F-tests.

The F-distribution is a continuous probability distribution used for the comparison of two sample variances (such as $SS_B$ and $SS_W$) and regression analysis (will discuss this later). Using the above quantities, we have

$$
F = \frac{SS_W/(n_T-t)}{SS_B/(t-1)} \rightarrow F_{n_T-t, \ t-1}
$$

$n_T-t$ is the degrees of freedom of the **numerator** and $t-1$ is the degrees of the **denominator**. We can also flip the ratio to get another valid test statistic


$$
F^\prime = \frac{SS_B/(t-1)}{SS_W/(n_T-t)} \rightarrow F_{\ t-1, n_T-t}
$$

<font color = "red">**\color{red} Caution:** the degrees of freedom in the subscript MUST be adjusted so that the first index is the degrees of freedom of the numerator and the second subscript represents the degrees of freedom of the denominator. </font> <font color = "red">**\color{red} This means $F_{7,5}$ and $F_{5, 7}$ are two different F distributions!**</font>


$F_{df_N, df_D}$ is skewed to the right. The following figure shows the density curve of an F-distribution with degrees of freedom df1 = 7 (numerator) and df2 = 10 (denominator), together with the R commands to find the critical and p-values.

```{r echo = FALSE, fig.align='center', out.width="99%"}
include_graphics("week04/F-densityCurve.png")
```

The syntax of the two R functions used to find the p-value and critical value is annotated in the following figure.

```{r echo = FALSE, fig.align='center', out.width="90%"}
include_graphics("week04/F-DisRSyntax.png")
```

Most introductory statistics textbooks include an F table based on a few commonly used significance levels $\alpha$. For example, the critical value of $F_{7, 10}$ at $\alpha = 0.05$ labeled in the left panel of the density curve of $F_{7, 10}$ can also be found from the following F table

```{r echo = FALSE, fig.align='center', out.width="90%"}
include_graphics("week04/F-table.png")
```

The F tables are constructed based on a few significance levels. It is not to use use F table to find p-values. 

Next, we open a standalone section to formally introduce the one-way ANOVA for CRD data.

\

# One-Way ANOVA and Implementation

We have discussed the logic and valid measures for assessing the discrepancy between $H_0$ and $H_a$. Next, we summarize the above results in a classical one-way ANOVA table and test the following null hypothesis based on the CRD.

$$
H_0: \mu_1 = \mu_2 = \cdots = \mu_t \ \ \text{vs} \ \ H_a: \text{at least one mean differs}
$$

The test statistic and corresponding p-value are summarized in the following ANOVA table.

|  **Source**    | **DF** |   **SS**     |   **MS**  | **F-Value** | **P-Value**    |
|:---------------|:-------|:-------------|:-------------|:----------|:--------------|
| **Between**  | $t-1$    | $SS_B$       | $MS_B=SS_B/(t-1)$        | $MS_W/MS_B$ | |
| **Within**   | $n_T - t$| $SS_W$       | $MS_W = SS_W/(n_T-t)$    |             | |
| **Total**    | $n_T-`$  | $SS_T$       |

**Decision Rule**: If the p-value in the above table is less than the significance level $\alpha$, we reject the null hypothesis; otherwise, we conclude the null hypothesis.


Next, we use the **plant physiology data** to demonstrate to to construct the above ANOVA table and use R as a calculator to find $SS$ and $MS$ and related quantities in the ANOVA table. The following code loads the data into R in the same format (i.e., wide table format)

```{r}
# Create the data vectors
Control <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68)     # Control
Glucose <- c(57, 58, 59, 59, 62, 60, 60, 57, 59, 61)     # Glucose
Fructose <- c(58, 61, 56, 58, 57, 56, 58, 60, 57, 58)     # Fructose
GlucFruc <- c(58, 59, 58, 61, 57, 56, 58, 57, 57, 59)     # GlucFruc
Sucrose <- c(62, 66, 65, 63, 64, 62, 65, 65, 62, 67)     # Sucrose
## using cbind() to display the data
cbind(Control, Glucose, Fructose, GlucFruc, Sucrose)
#colnames(wide.table) <- c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose")
```


The next code chunk calculates different quantities in the ANOVA table. I will make comments in the code to show detailed steps. The primary R commands are `sum()`, `mean()`, and related R functions of the F-distribution.

```{r}
## degrees of freedom
t = 5
n.T = 10*5
dfN = t - 1
dfD = n.T - t
ybar.. = mean(c(Control, Glucose, Fructose, GlucFruc, Sucrose))
## Sum of squares
SS.B = 10*sum(c((mean(Control)-ybar..)^2, (mean(Glucose)-ybar..)^2, (mean(Fructose)-ybar..)^2, (mean(GlucFruc)-ybar..)^2, (mean(Sucrose)-ybar..)))
SS.W = sum(c(sum((Control-mean(Control))^2), sum((Glucose-mean(Glucose))^2), sum((Fructose-mean(Fructose))^2), sum((GlucFruc-mean(GlucFruc))^2), sum((Sucrose-mean(Sucrose))^2)))
SS.T = sum((c(Control, Glucose, Fructose, GlucFruc, Sucrose) - ybar..)^2)
## MS
MS.B = SS.B/dfN
MS.W = SS.W/dfD
## F value
F.value = MS.B/MS.W
p.val = pf(F.value, dfN, dfD, lower.tail = FALSE)
anova.out <- data.frame(
  DF = c(dfN, dfD),
  SS = c(SS.B, SS.W),
  MS = c(MS.B, MS.W),
  F.value = c(F.value, NA),
  p.value = c(p.val, NA)
)
rownames(anova.out)=c("Between", "Within")
print(anova.out, na.print = "")

```

**Conclusion**: The p-value is approximately 0. We reject the null hypothesis that all means are equal. In other words,  the mean pea section length is affected by various sugar media.

\


The second part of the following YouTube video  (after 15 Minutes, <https://www.youtube.com/watch?v=KJ5G2KjcXcA>) discussed in the next note on One-way ANOVA. 

\

<center><a href="https://www.youtube.com/watch?v=KJ5G2KjcXcA" target="popup" 
                   onclick="window.open('https://www.youtube.com/watch?v=KJ5G2KjcXcA',
                      'name','width=850,height=500')"><img src = "https://pengdsci.github.io/MAT121W5/img/VideoIcon.png" width="200" height="120"></a>
</center>

\










# ANOVA with R Function `aov()`

We still use **plant physiology data** to illustrate one-way ANOVA analysis using `aov()`. Since `aov()` requires a long table structure. That is, all measurements of the response must be stored in a column and a standalone column to label the treatment of each value in the measurement column. The following code creates a long table. 

```{r}
# Create the data vectors
lengths <- c(
  75, 67, 70, 75, 65, 71, 67, 67, 76, 68,     # Control
  57, 58, 59, 59, 62, 60, 60, 57, 59, 61,     # Glucose
  58, 61, 56, 58, 57, 56, 58, 60, 57, 58,     # Fructose
  58, 59, 58, 61, 57, 56, 58, 57, 57, 59,     # GlucFruc
  62, 66, 65, 63, 64, 62, 65, 65, 62, 67      # Sucrose
)

# Create the treatment factor
treatment <- rep(c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose"), each = 10)

# Combine into a data frame
pea.long <- data.frame(
  Length = lengths,
  Treatment = factor(treatment, levels = c("Control", "Glucose", "Fructose", "GlucFruc", "Sucrose"))
)

# View the first 20 rows to make sure the data structure is correct
head(pea.long, n=20)
```

We next call `aov()` to perform ANOVA analysis.

```{r}
## create an anova model object 
aov.model <- aov(Length ~ Treatment, data = pea.long)
## create the classical ANOVA table
anova(aov.model)
## or alternatively, you can simply use summary(aov.model) to produce the same results!
```

The annotated output is given by

```{r echo = FALSE, fig.align='center', out.width="60%"}
include_graphics("week04/Annotated-ANOVA-Output.png")
```

**Conclusion**:  Since the p-value is approximately 0, the null hypothesis is rejected. That is, the various sugar media (the independent, or explanatory, variable) significantly affect the mean pea section length.

\

The following YouTube video ( <https://www.youtube.com/watch?v=IkR3Rzrgiv4>) gives another example of using the R function `aov()`. The video also uses some graphical functions to create some visualizations. We did not discuss visualizations in this class, but we will cover more visualizations in subsequent statistics courses.

\

<center><a href="https://www.youtube.com/watch?v=IkR3Rzrgiv4" target="popup" 
                   onclick="window.open('https://www.youtube.com/watch?v=IkR3Rzrgiv4',
                      'name','width=850,height=500')"><img src = "https://pengdsci.github.io/MAT121W5/img/VideoIcon.png" width="200" height="120"></a>
</center>

\


# Multiple Comparisons

When analyzing variance (ANOVA), a significant F-test (i.e., p-value is less than the given significance level $\alpha$) indicates that at least one group mean differs from the others. However, ANOVA does not specify which pairs of groups are significantly different. **Post-hoc tests**, also known as **multiple comparison tests**, are used to identify these specific differences while controlling for the increased risk of Type I errors (false positives) that arise from performing multiple comparisons. 

There are different **Post-hoc tests**. We only introduce two commonly used post-hoc methods using R built-in functions:

* **Tukey’s Honest Significant Difference (HSD)** -Controls **family-wise error rate** (FWER) for all pairwise comparisons.

* **Bonferroni Correction** - Adjusts p-values by multiplying them by the number of comparisons


# Visual Comparison

We have learned different graphical summaries of data. Box-plot is a geometric representation of the 5-number-summary of a numerical variable. Since treatment variable has multiple categories, we can create a back-to-back box-plot to visualize the different in the distributions. 

The R function `boxplot()` can draw multiple boxplots of a numerical variable according the categories of the group variable. Since package **{graphics}** comes with the base R, installation and loading the package is unnecessary.

```{r}
boxplot(Length ~ Treatment, data = pea.long)
```

The above box-plots clearly show that the pea lengths using sugar media Fructose, Glucose, and GlucFruc are similar to each other.



## Tukey’s Honest Significant Difference (HSD)

Tukey’s Honest Significant Difference (HSD) is a post-hoc test used after a statistically significant ANOVA to determine which specific group means differ from each other. Unlike multiple t-tests, which inflate the Type I error rate, Tukey’s HSD adjusts for multiple comparisons, maintaining the family-wise error rate (FWER) at the desired level (e.g., 0.05).

Next, we perform a multiple comparison of pea length across different sugar media using the **plant physiology data**. For convenience, we use the long-format data frame defined earlier and an anova object in the following R code.

```{r}
## refit the ANOVA model
aov.model <- aov(Length ~ Treatment, data = pea.long)
## multiple comparison
tukey.comp <- TukeyHSD(aov.model)
print(tukey.comp)

```

The results indicate that there is no significant difference in pairs Fructose-Glucose, GlucFruc-Glucose, and GlucFruc-Fructose. 

```{r}
par(mai=c(1.5,2,1,1)) # Makes room on the plot for the group names
plot(tukey.comp,      # name of the TukeyHSD() object
     cex.lab = 0.6,   # adjust the font size of the labels of the vertical axis
     las = 1)         # orientation of the labels
```



## Bonferroni Comparison 

The Bonferroni correction is a conservative method for adjusting significance levels when performing multiple comparisons to reduce the risk of Type I errors (false positives). It is one of the simplest and most widely used approaches for controlling the family-wise error rate.

The R function `pairwise.t.test()` performs pairwise comparison. It requires inputting individual variables (response and treatment group). The following is code for the Bonferroni procedure.

```{r}
Bonfeeroni.result <- pairwise.t.test(
  x = pea.long$Length, 
  g = pea.long$Treatment,
  p.adjust.method = "bonferroni"
)
print(Bonfeeroni.result)
```

Using the p-value adjusted Bonferroni procedure, three pairs of treatment groups are significantly different: Fructose-Glucose, GlucFruc-Glucose, and GlucFruc-Fructose. This result is consistent with Tukeu's HSD method.

\

Before conclude this section, we recommend a YouTube video (<https://www.youtube.com/watch?v=hPUvqtmCu7Q>) with an example one multiple comparison among group means using R,

\

<center><a href="https://www.youtube.com/watch?v=hPUvqtmCu7Q" target="popup" 
                   onclick="window.open('https://www.youtube.com/watch?v=hPUvqtmCu7Q',
                      'name','width=850,height=500')"><img src = "https://pengdsci.github.io/MAT121W5/img/VideoIcon.png" width="200" height="120"></a>
</center>

\


# Concluding Remarks

**One-Way ANOVA** (Analysis of Variance) is a statistical method used to compare the means of three or more independent groups to determine if at least one group mean is significantly different from the others. It extends the t-test (which compares only two groups) to multiple groups.

Hypotheses

* **Null Hypothesis ($H_0$)**: All group means are equal($\mu_1 = \mu_2 = \cdots = \mu_k$).

* **Alternative Hypothesis ($H_a$)**: At least one group mean is different.


The fundamental assumptions of One-Way ANOVA include:

* **Independence**: Observations must be independent (e.g., random sampling).

* **Normality**: Data in each group should be approximately normally distributed (checked via Q-Q plots or Shapiro-Wilk test).

* **Homogeneity of Variance** (Homoscedasticity): Groups should have equal variances (tested using Levene’s test or Bartlett’s test).

If any of theses assumptions are violated, the classical ANOVA results will be invalid. There some alternatives of the classic ANOVA that will be covered in the subsequent related courses. The key information in the ANOVA analysis is summarized in the following one-way ANOVA table.

```{r echo = FALSE, fig.align='center', out.width="90%"}
include_graphics("week04/SummarizedANOVAtable.png")
```

If ANOVA rejects the null hypotheis, post-hoc tests identify which specific groups differ. We introduced two prcedures for this purpose.

* **Tukey’s HSD** (controls family-wise error rate).

* **Bonferroni Correction** (conservative adjustment).








