HD 5514 Research Methods (Fall 2019)

In-Class Acitivity: Analysis of Variance (ANOVA)

Load Data

We are going to use R.appendix1.data from Personality Project. We will load and print the data.

   Dosage Alertness
1       a        30
2       a        38
3       a        35
4       a        41
5       a        27
6       a        24
7       b        32
8       b        26
9       b        31
10      b        29
11      b        27
12      b        35
13      b        21
14      b        25
15      c        17
16      c        21
17      c        20
18      c        19
  Dosage Alertness
1      a        30
2      a        38
3      a        35
4      a        41
5      a        27
6      a        24

Check Data Set

Check the number of rows and columns.

[1] 18  2
'data.frame':   18 obs. of  2 variables:
 $ Dosage   : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 2 2 2 2 ...
 $ Alertness: int  30 38 35 41 27 24 32 26 31 29 ...
[1] "a" "b" "c"

Conduct one-way Anova

            Df Sum Sq Mean Sq F value  Pr(>F)   
Dosage       2  426.2  213.12   8.789 0.00298 **
Residuals   15  363.8   24.25                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Report means

model.tables computes summary tables for model fits, especially aov fits.

Tables of means
Grand mean
         
27.66667 

 Dosage 
       a     b     c
    32.5 28.25 19.25
rep  6.0  8.00  4.00

Check ANOVA assumptions

ANOVA assumes normal distribution and homogeneity of variances.

Check the homogeneity of variance assumption: Levene’s test

Perform Leven’s test using the function leveneTest in car package. If variance across groups is not significantly different, we can assume the homogeneity of variances in the different groups.

Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  2  4.1667 0.03638 *
      15                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA test with no assumption of equal variances: Welch one-way test

Welch one-way test is an alternative that does not require thae homogeneity of variance assumption.


    One-way analysis of means (not assuming equal variances)

data:  Alertness and Dosage
F = 19.339, num df = 2.0000, denom df = 9.2651, p-value =
0.0004931

Check the normality assumption

We can assume normality if all the points fall approximately along this reference line. Or you can conduct Shapiro-Wilk test on ANOVA residuals.


    Shapiro-Wilk normality test

data:  residuals(aov.ex1)
W = 0.98604, p-value = 0.991

Non-parametric alternative to one-way ANOVA test: Kruskal-Wallis rank sum test

When ANOVA assumptions are not met, you cna use a non-parametric alternative.


    Kruskal-Wallis rank sum test

data:  Alertness by Dosage
Kruskal-Wallis chi-squared = 9.4272, df = 2, p-value = 0.008972

In-Class Acitivity: Multiple pairwise-comparison

Bonferroni and Holm multiple pairwise-comparisons

In one-way Anova, a siginificant p-value means some of the group means are different. However, this does not tell us which pairs of groups are different. We can perform multiple pairwise-comparison to test the statistically significant mean difference between particular pairs of group. Given the significant ANOVA test, we can compute adjusted p-values using pairwise.t.test.


    Pairwise comparisons using t tests with pooled SD 

data:  data1$Alertness and data1$Dosage 

  a       b      
b 0.13088 -      
c 0.00082 0.00926

P value adjustment method: none 

    Pairwise comparisons using t tests with pooled SD 

data:  data1$Alertness and data1$Dosage 

  a      b     
b 0.3926 -     
c 0.0025 0.0278

P value adjustment method: bonferroni 

    Pairwise comparisons using t tests with pooled SD 

data:  data1$Alertness and data1$Dosage 

  a      b     
b 0.1309 -     
c 0.0025 0.0185

P value adjustment method: holm 

Tukey multiple pairwise-comparisons

Tukey HSD (Tukey Honest Significant Differences) using TukeyHD.

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Alertness ~ Dosage, data = data1)

$Dosage
      diff       lwr       upr     p adj
b-a  -4.25 -11.15796  2.657961 0.2768132
c-a -13.25 -21.50659 -4.993408 0.0022342
c-b  -9.00 -16.83289 -1.167109 0.0237003

In-Class Acitivity: Planned Constrasts

You can test some specific hypotheses than testing all possible mean comparisons. We can combine multiple means from different levels and compare two means (e.g., a and b vs. c)

Analysis of Variance Table

Response: Alertness
          Df Sum Sq Mean Sq F value   Pr(>F)   
c1         1  42.98   42.98  1.7722 0.202990   
c2         1 383.27  383.27 15.8051 0.001218 **
Residuals 15 363.75   24.25                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In-Class Acitivity: Tidyverse

tidyverse is an opinionated collection of R packages designed for data science. https://www.tidyverse.org/ This is designed to make it easy to install and load core packages from the tidyverse. magrittr offers a set of operators which make your code more readable such as the pipe operator %>% https://magrittr.tidyverse.org/ dplyr provides a set of tools for efficiently manipulating datasets: https://dplyr.tidyverse.org/.

Computes summary tables using dplyr

# A tibble: 3 x 4
  Dosage count  mean    sd
  <fct>  <int> <dbl> <dbl>
1 a          6  32.5  6.60
2 b          8  28.2  4.43
3 c          4  19.2  1.71

Assignment 10 (Week 15)

Read Data

We will use build-in data set Prestige contained in the R package car. We will load and print the survey data.

                    education income women prestige census type
gov.administrators      13.11  12351 11.16     68.8   1113 prof
general.managers        12.26  25879  4.02     69.1   1130 prof
accountants             12.77   9271 15.70     63.4   1171 prof
purchasing.officers     11.42   8865  9.11     56.8   1175 prof
chemists                14.62   8403 11.68     73.5   2111 prof
physicists              15.64  11030  5.13     77.6   2113 prof

Check Data

Check the number of rows and columns.

'data.frame':   102 obs. of  6 variables:
 $ education: num  13.1 12.3 12.8 11.4 14.6 ...
 $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
 $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
 $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
 $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
 $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
[1] 102   6

Use the help function to learn about variables

If you want to learn more about the t.test function.

Koeun Choi

December 4, 2019