HD 5514 Research Methods (Fall 2019)
Inference for Regression (W13)
HD 5514 Research Methods (Fall 2019)
- In-Class Acitivity: Correlation
- In-Class Acitivity: Simple Linear Regression
- Load Data
- Check Data Set
- Visualize your data using a scatter plot
- Visualize your data using a boxplot and check for outliers
- Check the normality of the response variable using a histogram
- Chek the normality of the response variable using a density plot
- Chek the normality of the response variable using a QQ plot
- Fit a simple linear regression model
- Produce a summary of the model fitting
- Checking for statistical significance
- In-Class Acitivity: Multiple Linear Regression
- Assignment 9 (Week 13)
- Read Data
- Check Data
- Use the help function to learn about variables
- Check three data frame columns (income, education)
- Visualize your variable and check for outliers
- Check the normality of a potenital response variable using a histogram
- Check the normality of a potenital response variable using a density plot
- Check the normality of a potenital response variable using a QQ plot
- Check the normality of the response variable using a histogram
- Chek the normality of the response variable using a density plot
- Check the normality of the response variable using a QQ plot
- Fit a simple linear regression model
- Produce a summary of the model fitting
In-Class Acitivity: Correlation
Load Data
We are going to use cars
data included in R by default. We will load and print the cars data.
Check Data Set
Check the number of rows and columns.
Visualize your data using a scatter plot
Find the data columns, named speed
(numeric Speed (mph)) and dist
(numeric Stopping distance (ft)) from the cars data set. Create a scatter plot to visualze the relationship between the two variables.
Calcuate correlation
Calcuate correlation among two variables speed
and dist
using the cor
function. Conduct statistical testing for the association using cor.test
function. The default is Pearson’s correlation, but you can use Spearman’s (rho) rank correlation method="spearman"
or Kendall’s tau method="kendall"
for nonparametic tests.
In-Class Acitivity: Simple Linear Regression
Load Data
We are going to use cars
data included in R by default. We will load and print the cars data.
Check Data Set
Check the number of rows and columns.
Visualize your data using a scatter plot
Find the data columns, named speed
(numeric Speed (mph)) and dist
(numeric Stopping distance (ft)) from the cars data set. Create a scatter plot to visualze linear relationship between the two variables.
Visualize your data using a boxplot and check for outliers
Use the boxplot
funciton to create a boxplot for each variable: speed
(numeric Speed (mph)) and dist
(numeric Stopping distance (ft)) from the cars data set. See if there is any outlier based on the 1.5 x interquartile-range (1.5 * IQR) rule.
# box plot for 'speed'
boxplot(cars$speed, main="Speed")
# box plot for 'speed'
boxplot(cars$dist, main="Distance")
# Visualize two figures side by side and add notes (run the following 3 rows at the same time)
par(mfrow=c(1, 2)) # divide graph area in 2 columns
boxplot(cars$speed, main="Speed", sub = paste("Outlier rows: ", boxplot.stats(cars$speed)$out))
boxplot(cars$dist, main="Distance", sub=paste("Outlier rows: ", boxplot.stats(cars$dist)$out))
Check the normality of the response variable using a histogram
Create a histogram with the function hist()
. You can change the number of bins using the breaks=
argument.
Chek the normality of the response variable using a density plot
Create a density plot with the function density()
.
Chek the normality of the response variable using a QQ plot
Create a density plot with the function qqnorm()
. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.
Fit a simple linear regression model
Fit a linear model with the function lm()
. Regress the outcome variable dist
on the exploratory variable speed
Produce a summary of the model fitting
Produce a summary of the results of the model fitting using summary
funciton.
Checking for statistical significance
Check coefficients, F-statistic, and R-squared
In-Class Acitivity: Multiple Linear Regression
Load Data
We are going to use mtcars
data included in R by default. We will load and print the mtcars data.
Fit a multiple linear regression model
Fit a linear model with the function lm()
. Regress the outcome variable mpg
(Miles/(US) gallon) on the exploratory variables cyl
(Number of cylinders), wt
(Weight (1000 lbs)), and vs
Engine (0 = V-shaped, 1 = straight).
Produce a summary of the model fitting
Produce a summary of the results of the model fitting using summary
funciton.
Assignment 9 (Week 13)
Read Data
We will use build-in data set Prestige
contained in the R package car
. We will load and print the survey data.
Check Data
Check the number of rows and columns.
Use the help function to learn about variables
If you want to learn more about the t.test function.
Check three data frame columns (income, education)
First, find the data column, named income
, of the survey data set.
Next, find another data column, named education
in the survey data set.
Lastly, find the data column, named prestige
, of the survey data set.
Visualize your variable and check for outliers
Use the boxplot
funciton to create a boxplot for each variable: income
(Average income of incumbents, dollars, in 1971), education
(Average education of occupational incumbents, years, in 1971), prestige
(Pineo-Porter prestige score for occupation, from a social survey conducted in the mid-1960s) from the cars data set. See if there is any outlier based on the 1.5 x interquartile-range (1.5 * IQR) rule.
?Prestige
# box plot for 'income'
boxplot(Prestige$income, main="Income")
boxplot(Prestige$income, main="Income", sub = paste("Outlier rows: ", boxplot.stats(Prestige$income)$out))
# Check the outliers for 'income'
boxplot.stats(Prestige$income)$out
# box plot for 'education'
boxplot(Prestige$education, main="Education")
# Check the outliers for 'education'
boxplot(Prestige$education, main="Education", sub = paste("Outlier rows: ", boxplot.stats(Prestige$education)$out))
# box plot for 'prestige'
boxplot(Prestige$prestige, main="Prestige")
# Check the outliers for 'prestige'
boxplot(Prestige$prestige, main="Prestige", sub = paste("Outlier rows: ", boxplot.stats(Prestige$prestige)$out))
Check the normality of a potenital response variable using a histogram
Create a histogram with the function hist()
. You can change the number of bins using the breaks=
argument.
Check the normality of a potenital response variable using a density plot
Create a density plot with the function density()
.
Check the normality of a potenital response variable using a QQ plot
Create a density plot with the function qqnorm()
. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.
Check the normality of the response variable using a histogram
Create a histogram with the function hist()
. You can change the number of bins using the breaks=
argument.
Chek the normality of the response variable using a density plot
Create a density plot with the function density()
.
Check the normality of the response variable using a QQ plot
Create a density plot with the function qqnorm()
. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.
Fit a simple linear regression model
Fit a linear model with the function lm()
. Regress the outcome variable prestige
on the exploratory variable education
Produce a summary of the model fitting
Produce a summary of the results of the model fitting using summary
funciton.