Week 2: Chi-square Tests for Goodness-of-fit and Independence
Zoom Office Hours: Tuesday/Wednesday/Thursday 1:30 PM - 3:00 PM
1. Chi-square Distribution and Goodness-of-fit Test
Lecture Note:
|HTML|
PDF|
- Chi-square Distribution
- Chi-square distribution density curve: skewed to the right
- Finding right-tail probability: related to p-value of chi-square tests
- Find percentile: related to critical value.
- Two R built-in functions
- right-tail probability: pchisq(x, df, lower.tail = FALSE)
- percentile: qchisq(p, df, lower.tail = TRUE)
- Chi-square Goodness-of-fit Test
- Setting up Hypothesis
- H0: The data follows the given distribution p = (p1, p2, ..., pk)
- Ha: The data NOT follow the given distribution p = (p1, p2, ..., pk)
- Calculate expected frequencies under H0
- Find the total number of observations (i.e.,sample size) denoted by n
- Expected observation of j-th cell: Ej = n* pj
- Test statistics: $G^2= \sum_{j=1}^k (O_j - E_j)^2/E_j \rightarrow \chi^2_{k-1}$
- Implementation in R: chisq.test(obs.freq, p)
- Observed Frequency: obs.freq = $(n_1, n_2, \cdots, n_k)$
- Hypothetical probability distribution: $p=(p_1, p_2, \cdots, p_k)$
2. Chi-square Test of Independence
Lecture Note:
|HTML|
PDF|
- Study Designs
- Retropsective corhort study design (look at histoorical data)
- Prospective corhort study design (follow-up study)
- Cross-sectional study (look at the current data, a snapshot observations)
- Measures of Association
- Absolute risk, relative risk, and attributable risk
- Odds ratio
- Risk measures vs study designs
- $\chi^2$ test of independence between two categorical variables
- Observed two-way contingency tables: I rows and J columns
- Null hypothesis: $H_0$ - two categorical variables are independent.
- Expected frequencies under $H_0$:
- Expected frequency of i-th row and j-th column: $E_{ij}=\text{i-th row total}\times \text{j-th column total}/\text{grand total}$
- Test statistic: $G^2 = \sum_{i=1}^I\sum_{j=1}^J (O_{ij}-E_{ij})^2/E_{ij} \rightarrow chi^2_{(I-1)(J-1)}$
- Implementation in R: chisq.test()
- Using observed contingency table: chisq.test(obs.table)
- Using two categorical variables directly: chisq.test(x,y)
3. Weekly Exam #2 Information
- Open: Friday at noon
- Close: Sunday at midnight
- Guideline: |PDF|
Week 1: Computing Software and Review of Introductory Statistics
Zoom Office Hours: Tuesday/Wednesday/Thursday 1:30 PM - 3:00 PM
1. Course Information and Learning Advice
Note: |HTML|
PDF|
2. Getting started with R and RStudio
Lecture Note:
|HTML|
PDF|
3. Basic statistics review
Lecture Note:
|HTML|
PDF|
- Sampling distributions and Central Limit Theorem (CLT)
- Confidence intervals of sample means: normal and t distributions
- Testing hypothesis: logic, steps, p-value
4. Least square simple linear regression (SLR): inference and applications
Lecture Note:
|HTML|
PDF|
- Structure, interpretation of coeffcients with an emphasis on the slope parameters
- Assumptions and model diagnostics (validation)
- R functions for creating various residual diagnostic plots
- Clear understanding of R output and know hoe to use the information to
- perform hypothesis testing on the slope
- construct confidence interval of the slope
- assess the goodness-of-fit using R2
5. Weekly Exam #1
- Information:
|PDF|
- Answer Key and Summary of Exam #1: |PDF|