# HD 5514 Research Methods (Fall 2019)

# Inference for Regression (W13)

## HD 5514 Research Methods (Fall 2019)

- In-Class Acitivity: Correlation
- In-Class Acitivity: Simple Linear Regression
- Load Data
- Check Data Set
- Visualize your data using a scatter plot
- Visualize your data using a boxplot and check for outliers
- Check the normality of the response variable using a histogram
- Chek the normality of the response variable using a density plot
- Chek the normality of the response variable using a QQ plot
- Fit a simple linear regression model
- Produce a summary of the model fitting
- Checking for statistical significance

- In-Class Acitivity: Multiple Linear Regression
- Assignment 9 (Week 13)
- Read Data
- Check Data
- Use the help function to learn about variables
- Check three data frame columns (income, education)
- Visualize your variable and check for outliers
- Check the normality of a potenital response variable using a histogram
- Check the normality of a potenital response variable using a density plot
- Check the normality of a potenital response variable using a QQ plot
- Check the normality of the response variable using a histogram
- Chek the normality of the response variable using a density plot
- Check the normality of the response variable using a QQ plot
- Fit a simple linear regression model
- Produce a summary of the model fitting

# In-Class Acitivity: Correlation

## Load Data

We are going to use `cars`

data included in R by default. We will load and print the cars data.

## Check Data Set

Check the number of rows and columns.

## Visualize your data using a scatter plot

Find the data columns, named `speed`

(numeric Speed (mph)) and `dist`

(numeric Stopping distance (ft)) from the cars data set. Create a scatter plot to visualze the relationship between the two variables.

## Calcuate correlation

Calcuate correlation among two variables `speed`

and `dist`

using the `cor`

function. Conduct statistical testing for the association using `cor.test`

function. The default is Pearson’s correlation, but you can use Spearman’s (rho) rank correlation `method="spearman"`

or Kendall’s tau `method="kendall"`

for nonparametic tests.

# In-Class Acitivity: Simple Linear Regression

## Load Data

We are going to use `cars`

data included in R by default. We will load and print the cars data.

## Check Data Set

Check the number of rows and columns.

## Visualize your data using a scatter plot

Find the data columns, named `speed`

(numeric Speed (mph)) and `dist`

(numeric Stopping distance (ft)) from the cars data set. Create a scatter plot to visualze linear relationship between the two variables.

## Visualize your data using a boxplot and check for outliers

Use the `boxplot`

funciton to create a boxplot for each variable: `speed`

(numeric Speed (mph)) and `dist`

(numeric Stopping distance (ft)) from the cars data set. See if there is any outlier based on the 1.5 x interquartile-range (1.5 * IQR) rule.

```
# box plot for 'speed'
boxplot(cars$speed, main="Speed")
# box plot for 'speed'
boxplot(cars$dist, main="Distance")
# Visualize two figures side by side and add notes (run the following 3 rows at the same time)
par(mfrow=c(1, 2)) # divide graph area in 2 columns
boxplot(cars$speed, main="Speed", sub = paste("Outlier rows: ", boxplot.stats(cars$speed)$out))
boxplot(cars$dist, main="Distance", sub=paste("Outlier rows: ", boxplot.stats(cars$dist)$out))
```

## Check the normality of the response variable using a histogram

Create a histogram with the function `hist()`

. You can change the number of bins using the `breaks=`

argument.

## Chek the normality of the response variable using a density plot

Create a density plot with the function `density()`

.

## Chek the normality of the response variable using a QQ plot

Create a density plot with the function `qqnorm()`

. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.

## Fit a simple linear regression model

Fit a linear model with the function `lm()`

. Regress the outcome variable `dist`

on the exploratory variable `speed`

## Produce a summary of the model fitting

Produce a summary of the results of the model fitting using `summary`

funciton.

## Checking for statistical significance

Check coefficients, F-statistic, and R-squared

# In-Class Acitivity: Multiple Linear Regression

## Load Data

We are going to use `mtcars`

data included in R by default. We will load and print the mtcars data.

## Fit a multiple linear regression model

Fit a linear model with the function `lm()`

. Regress the outcome variable `mpg`

(Miles/(US) gallon) on the exploratory variables `cyl`

(Number of cylinders), `wt`

(Weight (1000 lbs)), and `vs`

Engine (0 = V-shaped, 1 = straight).

## Produce a summary of the model fitting

Produce a summary of the results of the model fitting using `summary`

funciton.

# Assignment 9 (Week 13)

## Read Data

We will use build-in data set `Prestige`

contained in the R package `car`

. We will load and print the survey data.

## Check Data

Check the number of rows and columns.

## Use the help function to learn about variables

If you want to learn more about the t.test function.

## Check three data frame columns (income, education)

First, find the data column, named `income`

, of the survey data set.

Next, find another data column, named `education`

in the survey data set.

Lastly, find the data column, named `prestige`

, of the survey data set.

## Visualize your variable and check for outliers

Use the `boxplot`

funciton to create a boxplot for each variable: `income`

(Average income of incumbents, dollars, in 1971), `education`

(Average education of occupational incumbents, years, in 1971), `prestige`

(Pineo-Porter prestige score for occupation, from a social survey conducted in the mid-1960s) from the cars data set. See if there is any outlier based on the 1.5 x interquartile-range (1.5 * IQR) rule.

```
?Prestige
# box plot for 'income'
boxplot(Prestige$income, main="Income")
boxplot(Prestige$income, main="Income", sub = paste("Outlier rows: ", boxplot.stats(Prestige$income)$out))
# Check the outliers for 'income'
boxplot.stats(Prestige$income)$out
# box plot for 'education'
boxplot(Prestige$education, main="Education")
# Check the outliers for 'education'
boxplot(Prestige$education, main="Education", sub = paste("Outlier rows: ", boxplot.stats(Prestige$education)$out))
# box plot for 'prestige'
boxplot(Prestige$prestige, main="Prestige")
# Check the outliers for 'prestige'
boxplot(Prestige$prestige, main="Prestige", sub = paste("Outlier rows: ", boxplot.stats(Prestige$prestige)$out))
```

## Check the normality of a potenital response variable using a histogram

Create a histogram with the function `hist()`

. You can change the number of bins using the `breaks=`

argument.

## Check the normality of a potenital response variable using a density plot

Create a density plot with the function `density()`

.

## Check the normality of a potenital response variable using a QQ plot

Create a density plot with the function `qqnorm()`

. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.

## Check the normality of the response variable using a histogram

Create a histogram with the function `hist()`

. You can change the number of bins using the `breaks=`

argument.

## Chek the normality of the response variable using a density plot

Create a density plot with the function `density()`

.

## Check the normality of the response variable using a QQ plot

Create a density plot with the function `qqnorm()`

. Quantile-Quantile plot (QQ-plot) shows the correlation between a given sample and the normal distribution.

## Fit a simple linear regression model

Fit a linear model with the function `lm()`

. Regress the outcome variable `prestige`

on the exploratory variable `education`

## Produce a summary of the model fitting

Produce a summary of the results of the model fitting using `summary`

funciton.