# MA251: Statistics II

Unit 1: Review   In this unit, we will review some of the basic concepts you learned in Introduction to Statistics (MA121), including probability distribution, hypothesis testing, analysis of variance (ANOVA), basic regression, and correlation. This unit serves as the foundation for the subsequent units. Feel free to skim through subunits in which you are confident of a thorough understanding of the topics covered.

In this unit, you will also learn to use R, a powerful statistical programming language. The New York Times has a fascinating article about R here. R is widely used by engineers, statisticians and scientists for data analysis. The main factor distinguishing R from other statistical and numerical languages like SAS, Matlab, Mathematica or SPLUS is that R is freely available to everybody. It is developed for scientists and maintained by scientists. Since its inception in 1996, the R-project has been constantly upgraded by many contributors, which expand the capabilities of the language through add-on packages.

This unit will take you approximately 31 hours to complete.

☐    Subunit 1.1: 5 hours

☐    Subunit 1.2: 8 hours

☐    Subunit 1.3: 7 hours

☐    Subunit 1.4: 11 hours

Unit1 Learning Outcomes
Upon successful completion of this unit, the student will be able to:

• perform calculations with binomial distribution, exponential distribution, Poisson distribution, and normal distribution;
• conduct hypothesis testing using student’s t-test and chi-square procedures;
• perform one-way and two-way ANOVAs;
• perform simple linear regression; and
• explain correlation and covariance.

1.1 Overview of Probability Distributions   - Reading: MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes: Lecture 1—“Overview of Some Probability Distributions” Link: MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes: Lecture 1—“Overview of Some Probability Distributions” (PDF)

Instructions: Download the PDF file of the lecture note of section #1, “Overview of Some Probability Distribution.” The reading provides an overview of several important discrete and continuous probability distributions, including binomial distribution, exponential distribution, Poisson distribution, and normal distribution. Under which conditions can a binomial distribution or a Poisson distribution be approximated by a normal distribution?

`````` Reading this lecture should take approximately 1 hour.

displayed on the webpage above.
``````
• Activity: The Comprehensive R Archive Network’s “Getting Started with R” Link: The Comprehensive R Archive Network’s “Getting Started with R (HTML)

Instructions: Click on the link above, which will take you to the official website for R: The Comprehensive R Archive Network. Download and install the appropriate version of R for your computer. The installation procedure is very straightforward. Once R is installed, go back to the website and click on “Manuals” under “Documentation,” which will take you to various R manuals. Click on “An Introduction to R” (either HTML or PDF) and read it. This manual will give you basic commands in R. Pay careful attention to chapters 7 and 8. Be sure to test out the R examples in the manual in the R command window to get used to R syntaxes.

Completing this activity should take approximately 4 hours.

• Activity: Dr. Alastair Sanderson’s “An Introduction to Using R” Link: Dr. Alastair Sanderson’s “An Introduction to Using R (HTML)

Instructions: Click on the above link. This webpage is a detailed tutorial for getting started with R. Copy and paste the R code from the webpage to the R command window. Make sure that you obtain similar results as shown on the webpage.

1.2 Overview of Tests for Statistical Significance   1.2.1 Basic Concepts   - Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 7: Tests of Statistical Significance: Three Overarching Concepts” Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics“Chapter 7: Tests of Statistical Significance: Three Overarching Concepts” (HTML)

`````` Instructions: Click on “Table of Contents” and read “Chapter 7:
Tests of Statistical Significance: Three Overarching Concepts.” The
reading will provide an overview of basic concepts in testing for
statistical significance, including mean chance expectation, null
hypothesis and research hypothesis, directional versus
nondirectional research hypotheses, and one-way versus two-way tests
of significance. What is the main difference between the null
hypothesis and the research hypothesis? Think of a situation that
you would prefer two-way testing over one-way testing. This section
highlights the importance of coming up with the right question
before figuring out which tool would be appropriate for answering
the question.

Reading this chapter should take approximtely 1 hour.

displayed on the webpage above.
``````

1.2.2 Chi-Square Procedures   - Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 8: Chi-Square Procedures for the Analysis of Categorical Frequency Data” Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics“Chapter 8: Chi-Square Procedures for the Analysis of Categorical Frequency Data” (HTML)

`````` Instructions: Click on “Table of Contents” and read “Chapter 8:
Chi-Square Procedures for the Analysis of Categorical Frequency
Data.” The reading will provide an overview of applications of
chi-square procedures. Chi-square procedures are often used for
tests of goodness of fit and tests of independence. What are the
limitations of chi-square procedures? Under which conditions is a
Fisher Exact Probability Test more suitable than a chi-square
test?

Reading this chapter should take approximately 1 hour.

displayed on the webpage above.
``````

1.2.3 Student’s t-tests   - Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 10. t-Procedures for Estimating the Mean of a Population,” “Chapter 11. t-Test for Two Independent Samples,” and “Chapter 12. t-Test for Two Correlated Samples” Links: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 10. t-Procedures for Estimating the Mean of a Population,” “Chapter 11. t-Test for Two Independent Samples,” and “Chapter 12. t-Test for Two Correlated Samples” (HTML)

Instructions: Click on “Table of Contents” and read “Chapter 10: t-Procedures for Estimating the Mean of a Population,” “Chapter 11: t-Test for Two Independent Samples,” and “Chapter 12: t-Test for Two Correlated Samples.” These readings will provide an overview of student’s t-distribution and applications of t-tests for independent and correlated samples. Student’s t distribution arises when estimating the mean of normally distributed random variables of a sample with small size and the “true” standard deviation is unknown.

`````` Reading these chapters should take approximately 1 hour.

displayed on the webpage above.
``````
• Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 14” Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 14” (HTML)

Instructions: Click on the link above and answer all questions in the quiz. Select your answer from choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.

Completing this quiz should take less than 1 hour.

• Assessment: The Saylor Foundation’s “One and Two-Sample Problems: Student’s T-Tests” Link: The Saylor Foundation’s “One and Two-Sample Problems: Student’s T-Tests (PDF)

Instructions: Complete the linked assessment, titled “One and two-sample problems: Student’s t-tests”. When you are done, check your work against The Saylor Foundation’s “Answer Key for One and two-sample problems: Student’s t-tests in unit 1.2.

Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for computer assessments.

• Assessment: The Saylor Foundation’s “Subunit 1.2.3 Assessment” Link: The Saylor Foundation’s “Subunit 1.2.3 Assessment”

Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.

1.3 Overview of Analysis of Variance (ANOVA)   This subunit provides an overview of one-way and two-way analyses of variance (or ANOVAs). The purpose of analysis of variance (ANOVA) is to test for significant differences between means. In order to test for statistical significance between means, we actually analyze variances – hence the name “analysis of variance.” ANOVAs will enable you to examine the amount of variability in a response variable and/or understand where the variability is coming from. This unit will specifically teach you how to use the one-way ANOVA to test for differences between the means of several groups and to use the two-way ANOVA and interpret the interaction effect.

1.3.1 Basic Concepts   - Lecture: YouTube: Medical College of Wisconsin: Sergey Tarima’s “ANOVA: Comparing More Than Two Treatments” Link: YouTube: Medical College of Wisconsin: Sergey Tarima’s “ANOVA: Comparing More Than Two Treatments (YouTube). This content is also available in PDF format on the Medical College of Wisconsin's website  “ANOVA: Comparing More Than Two Treatments” (PDF)

`````` Instructions: Click on the link for the video for the lecture on
“ANOVA: Comparing More Than Two Treatments.” You may also want to
lecture, Sergey Tarima discusses applications of one-way and two-way
ANOVA in medical research. The lecture will cover content for
subunits 1.3.1-1.3.3.

Watching this lecture and pausing to take notes should take
approximately 1 hour.

displayed on the webpage above.
``````
• Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 13: Conceptual Introduction to the Analysis of Variance”

Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 13: Conceptual Introduction to the Analysis of Variance” (HTML)

Instructions: Click on “Table of Contents” and read “Chapter 13: Conceptual Introduction to the Analysis of Variance.” The reading will provide an overview of basic concepts in analysis of variance. The fundamental idea of ANOVA is that the observed variance of a variable can be decomposed into different sources of variations. Different ways of partitioning sources of variations are referred to as “statistical models.” The ANOVA depends on F-statistics, that is, the ratio of the variance of the means to the variance within the samples. What is the relationship between F-distribution and t-distribution?

Reading this chapter should take approximately 1 hour.

1.3.2 One-Way ANOVA   Note: This subunit is covered by the video lecture assigned beneath subunit 1.3.1. The video lecture will provide an example illustrating the use of one-way ANOVA in medical research.

• Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 15: One-Way Analysis of Variance for Correlated Samples” Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 15: One-Way Analysis of Variance for Correlated Samples” (HTML)

Instructions: Click on “Table of Contents” and read “Chapter 15: One-Way Analysis of Variance for Correlated Samples.” The reading provides an overview of one-way ANOVA for correlated samples, which can be considered to be an extension of correlated-samples t-test.

Reading this chapter should take approximately 1 hour.

• Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 14: One-Way Analysis of Variance for Independent Samples” Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics“Chapter 14: One-Way Analysis of Variance for Independent Samples” (HTML)

Instructions: Click on “Table of Contents” and read “Chapter 14: One-Way Analysis of Variance for Independent Samples.” The reading will provide an overview of using one-way ANOVA to compare means of two or more sampled using the F distribution for independent samples. The ANOVA tests the null hypothesis of the samples in two or more groups being drawn from the same population. What are the basic assumptions of one-way ANOVA? Why should the variance of the group means be lower than the variance of the samples?

Reading this chapter should take approximately 1 hour.

1.3.3 Two-Way ANOVA   Note: This subunit is covered by the video lecture assigned beneath subunit 1.3.1. The second half of the video focuses on the use of two-way ANOVA in medical research.

• Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 16: Two-Way Analysis of Variance for Independent Samples” Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics“Chapter 16: Two-Way Analysis of Variance for Independent Samples (HTML)

Instructions: Click on “Table of Contents” and read “Chapter 16: Two-Way Analysis of Variance for Independent Samples.” Two-way ANOVA is an extension of the one-way ANOVA to examine the influence of different categorical independent variables on one dependent variable.

Reading this chapter should take approximately 1 hour.

• Assessment: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 1” Link: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 1 (PDF)

Instructions: Click on the link above and scroll down to Homework 1 to complete this assignment. Follow the instructions for the problems closely, particularly the R-based assignments. The solutions to the homework are in the pdf file and the R code.

Completing this assessment should take approximately 3 hours.

• Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 10” Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 10” (HTML)

Instructions: Click on the link above and answer all questions in the quiz. Select your answers from choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.

Completing this quiz should take less than 1 hour.

• Activity: John M Quick’s “R Tutorial Series: Two-Way ANOVA with Simple Interactions with Simple Main Effects” Link: John M Quick’s “R Tutorial Series: Two-Way ANOVA with Simple Interactions with Simple Main Effects (HTML)

Instructions: Click on the link and follow the instructions. This webpage contains a detailed tutorial for how to perform two-way ANOVA in R, it is attributed to John M Quick. Copy and paste the R code from the webpage to the R command window. Make sure that you obtain similar results as shown on the webpage.

Optional: Feel free to further explore other ANOVA tutorials:

• One-Way Omnibus ANOVA (HTML)

http://rtutorialseries.blogspot.com/2010/10/r-tutorial-series-one-way-omnibus-anova.html
- One-Way ANOVA with Comparisons (HTML)

http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.html
- Two-Way ANOVA with Unequal Sample Sizes (HTML)

on the webpage above

• Assessment: The Saylor Foundation’s “Subunit 1.3.3 Assessment” Link: The Saylor Foundation’s “Subunit 1.3.3 Assessment”

Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.

1.4 Overview of Regression   This subunit provides an overview of linear regression or the method of using one variable to predict another variable using a linear function (i.e. a straight line).  You will learn to calculate regression coefficients, make inferences about the slope and correlation coefficient, estimate mean values, and predict individual values.

1.4.1 Regression Basics   - Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” Link: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” (PDF)

`````` Instructions: Download the PDF file. Open the file and browse
to“Chapter 8: Regression Basics.” Read pages 70-79. Linear
regression models the relationship between a dependent variable (the
response) and an independent variable (the cause) by fitting a
linear equation to observed data. Define outliners and influential
observations in a sample. How do you use F-score for testing a
regression?

Reading this chapter should take approximately 1 hour.

attributed to Thomas K. Tieman, and the original version can be
``````

Instructions: This video provides a brief overview of regression. This lecture is optional if you already have a good understanding of regression.

Watching this video should take approximately 10 minutes.

1.4.2 Correlation and Covariance   - Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” Link: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” (PDF)

`````` Instructions: Download the PDF file. Open the file and browse to
“Chapter 8: Regression Basics.” Read pages 79-82. Covariance
measures how much two random variables change together. How does
correlation relate to covariance? What are the connections among
regression, correlation, and covariance?

Reading this chapter should take approximately 30 minutes.

attributed to Thomas K. Tieman, and the original version can be
``````

1.4.3 Traps and Pitfalls of Regression   - Reading: University of Otago: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 13: Traps and Pitfalls of Regression Analysis” Link: University of Otago: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 13: Traps and Pitfalls of Regression Analysis” (PDF)

`````` Instructions: Click on the link “Download the book as a PDF file”
throughout the course. Read “Chapter 13: Traps and Pitfalls of
Regression Analysis.” What does “regression towards the mean”
actually mean?

Reading this chapter should take approximately 1 hour.

displayed on the webpage above.
``````
• Assessment: Massachusetts Institute of Technology: Cynthia Rudin’s “Statistical Thinking and Data Analysis Exam 4” Link: Massachusetts Institute of Technology: Cynthia Rudin’s “Statistical Thinking and Data Analysis Exam 4” (PDF)

Instructions: Answer question 3, on page 4 of the exam.

Answering this question should take approximately 1 hour.

• Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 11”

Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 11” (HTML)

Instructions: Click on the link above and answer all questions in the quiz. Select your answer from the choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.

Completing this quiz should take approximately 1 hour.