Loading...

MA251: Statistics II

Unit 6: Introduction to Advanced Topics   This unit introduces you to a wide range of techniques used in modern statistics to address the rapidly increasing need for analyzing large-scale data and making predictions. Advances in information technology have led to explosion in data in all aspects of our life. Consequently, statistics and predictive analytics have become a major driving force for scientific discoveries, technological advances, and improvements in quality of life in the twenty-first century. You will learn about survival analysis, time series analysis, principal component analysis, structural equation models, and support vector machine. While each of the topics discussed in this unit deserves a separate course, the objective of this unit is to get you acquainted with these topics and provide you with some basic guidance so you can study these topics with more depth in the future.

Unit 6 Time Advisory
This unit will take you approximately 68 hours to complete.

☐    Subunit 6.1: 10 hours

☐    Subunit 6.2: 15 hours

☐    Subunit 6.3: 9 hours

☐    Subunit 6.4: 9 hours

☐    Subunit 6.5: 10 hours

☐    Subunit 6.6: 15 hours

Unit6 Learning Outcomes
Upon successful completion of this unit, the student will be able to:

• perform Principal Component Analysis (PCA);
• explain Structural Equation Models;
• perform survival analysis and time-series analysis; and
• explain the principles of classification algorithms used for  data mining such as a support vector machine.

6.1 Factor Analysis   - Reading: Lakehead University: Bruce Weaver’s “Chapter 7: Factor Analysis” Link: Lakehead University: Bruce Weaver’s “Chapter 7: Factor Analysis” (PDF)

`````` Instructions: Download the PDF file “pcafa.pdf” from the webpage
(bullet \#12 under “My own notes”) and read the whole chapter. In
this reading, you will learn about factor analysis and principal
component analysis.

Reading this chapter should take approximately 4 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Reading: Johns Hopkins School of Public Health: Elizabeth Garrett-Mayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II” Link: Johns Hopkins School of Public Health: Elizabeth Garrett-Mayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II” (PDF)

Instructions: Download the PDF files for “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II.” In these readings, you will learn to identify when a factor analysis is appropriate and when it is not, perform a one-factor and multi-factor analysis, and interpret the results from a factor analysis. These readings supplement Dr. Bruce Weaver’s lecture notes on factor analysis.

Reading through these lectures should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

• Reading: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R” Link: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R” (PDF)

Instructions: Download the PDF file for “Factor Analysis in R” located in the first row and third column of the table at the bottom of the webpage. In this reading, you will explore several functions related to factor analysis in the R package.

Reading this assignment should take approximately 2 hours and 30 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

• Lecture: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis” Link: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis” (YouTube, iTunes, and MP4)

Instructions: Please watch the video for Lecture 14. In this video, Andrew Ng discusses the derivation for factor analysis as well as principal component analysis. You may also download the transcript of the lecture (PDF or HTML).

Watching this lecture should take approximately 1 hour and 30 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

• Assessment: The Saylor Foundation’s “Factor Analysis Assessment” Link: The Saylor Foundation’s “Factor Analysis Assessment (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.1 (PDF).

Completing this assessment should take less than 2 hours.

• Assessment: The Saylor Foundation’s “Principal Component Analysis” Link: The Saylor Foundation’s “Principal Component Analysis (PDF)

Instructions: Complete the linked assessment, titled “Principal Component Analysis.” When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.

Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

6.2 Structural-Equation Models   - Reading: McMaster University: John Fox’s “Structural Equation Models” Link: McMaster University: John Fox’s “Structural Equation Models” (PDF)

Instructions: Browse to “Lecture Notes and R Scripts” at the bottom of the webpage and download the PDF file for “Structural Equation Models.” This reading will provide an overview of structural-equation models (SEMs). An important feature of SEMs is their ability to deal with a variety of models for the analysis of latent variables. SEMs incorporate independent and dependent variables and latent constructs that clusters of observed variables might represent. SEM enables hypothesis testing when experiments are not possible.

`````` Reading this lecture should take approximately 4 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Assessment: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9” Link: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9 (PDF)

Instructions: Click on the link above, download the PDF file for Homework 9 (hw-09.pdf), and complete all problems in the homework. Follow the instructions for the problems closely. The solutions to the homework are in the PDF file, solutions-09.pdf.

Completing this assessment should take approximately 6 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

• Assessment: The Saylor Foundation’s “Principal Component Analysis” Link: The Saylor Foundation’s “Principal Component Analysis (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.

Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

• Assessment: The Saylor Foundation’s “Structural Equation Modeling Assessment” Link: The Saylor Foundation’s “Structural Equation Modeling Assessment (PDF)

Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.2 (PDF).

Completing this assignment should take you no longer than 2 hours.

6.3 Survival Analysis   - Lecture: Medical College of Wisconsin: John Klein’s “Introduction to Survival Analysis” Link: Medical College of Wisconsin: John Klein’s “Introduction to Survival Analysis” (YouTube and PDF)

`````` Instructions: Click on the lecture “Uses and Abuses of
Non-parametric Statistics” and watch the video. You may also want to
download the presentation (in PDF) used for the lecture. In this
video, John Klein provides an overview of survival analysis,
including Kaplan Meier estimation and competing risk.

Watching this lecture should take approximately 1 hour.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Reading: McMaster University: John Fox’s “Survival Analysis” Link: McMaster University: John Fox’s “Survival Analysis” (PDF)

Instructions: Scroll down to “Lecture Notes and R Scripts” at the bottom of the webpage and download the PDF file for “Survival Analysis.”

Reading these lecture notes should take approximately 3 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

• Assessment: The Saylor Foundation’s “Survival Analysis”

Link: The Saylor Foundation’s Survival Analysis (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Survival Analysis” (PDF) in subunit 6.3.

Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

• Assessment: The Saylor Foundation’s “Unit 6.3 Assessment” Link: The Saylor Foundation’s “Unit 6.3 Assessment (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.3 (PDF).

Completing this assessment should take you no longer than 2 hours.

6.4 Multilevel (Hierarchical) Models   6.4.1 Introduction to Multilevel Modeling   - Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 1: Introduction” Link: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 1: Introduction” (PDF)

Instructions: Click on the link for “Chapter 1: Introduction” and download the PDF file. This chapter will introduce you to multilevel models, which are also known as hierarchical linear models, nested models, mixed models, random coefficient, or random-effects models. These types of models account for variations in parameters at multiple levels.

`````` Reading this chapter should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````

6.4.2 A Basic Linear Multilevel Model   - Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 2: The basic linear multilevel model and its estimation” Link: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 2: The basic linear multilevel model and its estimation” (PDF)

Instructions: Click on the link and download the PDF file for Chapter 2. This chapter will walk you through a simple two-level linear regression model.

`````` Reading this chapter should take approximately 4 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Assessment: The Saylor Foundation’s “Multilevel Modeling” Link: The Saylor Foundation’s “Multilevel Modeling (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Multilevel Modeling” (PDF) in subunit 6.5.

Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

• Assessment: The Saylor Foundation’s “Meta-Analysis Assessment” Link: The Saylor Foundation’s “Meta-Analysis Assessment (PDF)

Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.4 (PDF).

Completing this assignment should take you no longer than 2 hours.

6.5 Longitudinal data analysis   6.5.1 Overview of Longitudinal Data Analysis   - Reading: Marie Davidian’s “Introduction to Modeling and Analysis of Longitudinal Data” Link: Marie Davidian’s “Introduction to Modeling and Analysis of Longitudinal Data” (PDF)

Instructions: Click on the link for “Introduction to Modeling and Analysis of Longitudinal Data” (Introductory Lecture Session at the 2006 ENAR Spring Meeting, March 2006) and download the PDF file. The slide deck will provide an overview of longitudinal data analysis.

`````` Reading this lecture should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````

6.5.2 Time-Series Analysis   - Reading: Engineering Statistics Handbook: “Section 6.4: Introduction to Time Series Analysis” Link: Engineering Statistics Handbook: “Section 6.4: Introduction to Time Series Analysis” (HTML)

Instructions: Read sections 6.4.1-6.4.3 on the webpage.

`````` Reading this section should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Activity: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis” Link: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis (PDF)

Instructions: Click on the link above and download the PDF file “Introduction to R for Times Series Analysis” in the “R Help” section. This is a tutorial for times series analysis in R. Follow the instructions given in the tutorial. Pay careful attention to Section 3, which will guide you step-by-step on how to perform autoregressive integrated moving average (ARIMA).

Reading this section should take approximately 4 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

6.5.3 Case Study: Nonlinear Mixed Effects Models for Pharmacokinetic and Pharmacodynamic Analysis   - Reading: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis” Link: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects Models and PK/PD (PDF)

Instructions: Click on the link for “An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis” (ASA Biopharmaceutical Section webinar, April 2010) and download the PDF file. The slide deck will introduce you to the use of a multilevel modeling for analyzing longitudinal data in the areas of pharmacokinetic and pharmacodynamics. This case study will leverage what you learn in subunit 6.4 and subunit 6.5.1.

`````` Reading this study should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````

6.6 Data Mining and Machine Learning   6.6.1 Introduction   - Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 1: Introduction” Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 1: Introduction” (PowerPoint)

Instructions: Click on the link for “Chapter 1: Introduction” and download the slide deck. The slide deck will provide an overview of data mining.

`````` Reading through this slide show should take approximately 2
hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Lecture: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1” Link: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1” (YouTube, iTunes, and MP4)

Instructions: Watch the video for Lecture 1. In this video, Andrew Ng discusses the basic concepts and applications of machine learning. You may also download the transcript of the lecture (PDF of HTML). This lecture series is one of the best resources available on the web on machine learning. Consider watching other videos in the lecture series if you want to learn more about machine learning.

Watching this lecture should take approximately 1 hour and 15 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

6.6.2 Classification: Overview and Basics   - Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 8: Classification: Basic Concepts” Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 8: Classification: Basic Concepts” (PowerPoint)

Instructions: Click on the link for “Chapter 8: Classification: Basic Concepts” and download the slide deck. The slide deck will provide an overview of concepts of classification used in data mining, including supervised versus unsupervised learning, classification versus numeric prediction. You will also learn about basic classification techniques such as decision tree induction and Bayes classification.

`````` Reading this chapter should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````

6.6.3 Advanced classification methods: Bayesian belief networks, neural network, Support Vector Machine (SVM)   - Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 9: Classification: Advanced Methods” Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 9: Classification: Advanced Methods” (PowerPoint)

Instructions: Click on the link for “Chapter 9: Classification: Advanced Methods” and download the slide deck. The slide deck will provide an overview of advanced statistical methods developed for classification, including neural networks and Support Vector Machine (SVM), k-Nearest Neighboring (kNN) algorithm, and genetic algorithm. Which scenarios are best for training Bayesian networks? What are the strengths and weaknesses of neural network as a classifier? In SVM, what is the marginal hyperplane?

`````` Reading this chapter should take approximately 1 hour.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Activity: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R” Link: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R (PDF)

Instructions: Click on the link above and download the PDF file “svmdoc.pdf”. This is a tutorial for Support Vector Machines package in R. Follow the instructions given in the tutorial closely. You will need to install and load the “e1071” package for this tutorial.

Reading this tutorial should take approximately 3 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

6.6.4 Introduction to Bayesian Inference   - Lecture: Cambridge University: Christopher Bishop’s “Introduction To Bayesian Inference” Link: Cambridge University: Christopher Bishop’s “Introduction To Bayesian Inference” (Adobe Flash)

`````` Instructions: Watch the video for Lecture 1. In this video,
Christopher Bishop provides a brief overview of the past, the
present, and the future of machine. You will also learn about
Bayesian inference, the foundation of the third-generation of
machine learning techniques.

Watching this lecture should take approximately 1 hour and 30
minutes.

Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
``````
• Assessment: The Saylor Foundation’s “Classification Techniques” Link: The Saylor Foundation’s “Classification Techniques (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” (PDF) for subunit 6.6 (which was written in R).

Completing this assessment should take you no longer than 4 hours.

• Assessment: The Saylor Foundation’s “Unit 6.6 Assessment” Link: The Saylor Foundation’s “Unit 6.6 Assessment (PDF)

Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.6 (PDF).

Completing this assignment should take you no longer than 2 hours.

Unit 6 Assessment   - Assessment: The Saylor Foundation’s “Unit 6 Assessment” Link: The Saylor Foundation’s “Unit 6 Assessment”

Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.