**Unit 6: Introduction to Advanced Topics**
*This unit introduces you to a wide range of techniques used in modern
statistics to address the rapidly increasing need for analyzing
large-scale data and making predictions. Advances in information
technology have led to explosion in data in all aspects of our life.
Consequently, statistics and predictive analytics have become a major
driving force for scientific discoveries, technological advances, and
improvements in quality of life in the twenty-first century. You will
learn about survival analysis, time series analysis, principal component
analysis, structural equation models, and support vector machine. While
each of the topics discussed in this unit deserves a separate course,
the objective of this unit is to get you acquainted with these topics
and provide you with some basic guidance so you can study these topics
with more depth in the future.*

**Unit 6 Time Advisory**

This unit will take you approximately 68 hours to complete.

☐ Subunit 6.1: 10 hours

☐ Subunit 6.2: 15 hours

☐ Subunit 6.3: 9 hours

☐ Subunit 6.4: 9 hours

☐ Subunit 6.5: 10 hours

☐ Subunit 6.6: 15 hours

**Unit6 Learning Outcomes**

Upon successful completion of this unit, the student will be able to:

- perform Principal Component Analysis (PCA);
- explain Structural Equation Models;
- perform survival analysis and time-series analysis; and
- explain the principles of classification algorithms used for data mining such as a support vector machine.

**6.1 Factor Analysis**
- **Reading: Lakehead University: Bruce Weaver’s “Chapter 7: Factor
Analysis”**
Link: Lakehead University: Bruce Weaver’s “Chapter 7: Factor
Analysis”
(PDF)

```
Instructions: Download the PDF file “pcafa.pdf” from the webpage
(bullet \#12 under “My own notes”) and read the whole chapter. In
this reading, you will learn about factor analysis and principal
component analysis.
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Reading: Johns Hopkins School of Public Health: Elizabeth Garrett-Mayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II”**Link: Johns Hopkins School of Public Health: Elizabeth Garrett-Mayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II” (PDF)

Instructions: Download the PDF files for “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II.” In these readings, you will learn to identify when a factor analysis is appropriate and when it is not, perform a one-factor and multi-factor analysis, and interpret the results from a factor analysis. These readings supplement Dr. Bruce Weaver’s lecture notes on factor analysis.Reading through these lectures should take approximately 2 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**Reading: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R”**Link: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R” (PDF)

Instructions: Download the PDF file for “Factor Analysis in R” located in the first row and third column of the table at the bottom of the webpage. In this reading, you will explore several functions related to factor analysis in the R package.Reading this assignment should take approximately 2 hours and 30 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**Lecture: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis”**Link: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis” (YouTube, iTunes, and MP4)Instructions: Please watch the video for Lecture 14. In this video, Andrew Ng discusses the derivation for factor analysis as well as principal component analysis. You may also download the transcript of the lecture (PDF or HTML).

Watching this lecture should take approximately 1 hour and 30 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**Assessment: The Saylor Foundation’s “Factor Analysis Assessment”**Link: The Saylor Foundation’s “Factor Analysis Assessment” (PDF)Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.1 (PDF).

Completing this assessment should take less than 2 hours.

**Assessment: The Saylor Foundation’s “Principal Component Analysis”**Link: The Saylor Foundation’s “Principal Component Analysis” (PDF)

Instructions: Complete the linked assessment, titled “Principal Component Analysis.” When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

**6.2 Structural-Equation Models**
- **Reading: McMaster University: John Fox’s “Structural Equation
Models”**
Link: McMaster University: John Fox’s “Structural Equation
Models”
(PDF)

Instructions: Browse to “Lecture Notes and R Scripts” at the bottom
of the webpage and download the PDF file for “Structural Equation
Models.” This reading will provide an overview of
structural-equation models (SEMs). An important feature of SEMs is
their ability to deal with a variety of models for the analysis of
latent variables. SEMs incorporate independent and dependent
variables and latent constructs that clusters of observed variables
might represent. SEM enables hypothesis testing when experiments are
not possible.

```
Reading this lecture should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Assessment: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9”**Link: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9” (PDF)Instructions: Click on the link above, download the PDF file for Homework 9 (hw-09.pdf), and complete all problems in the homework. Follow the instructions for the problems closely. The solutions to the homework are in the PDF file, solutions-09.pdf.

Completing this assessment should take approximately 6 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**Assessment: The Saylor Foundation’s “Principal Component Analysis”**Link: The Saylor Foundation’s “Principal Component Analysis” (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

**Assessment: The Saylor Foundation’s “Structural Equation Modeling Assessment”**Link: The Saylor Foundation’s “Structural Equation Modeling Assessment” (PDF)Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.2 (PDF).

Completing this assignment should take you no longer than 2 hours.

**6.3 Survival Analysis**
- **Lecture: Medical College of Wisconsin: John Klein’s “Introduction
to Survival Analysis”**
Link: Medical College of Wisconsin: John Klein’s “Introduction to
Survival
Analysis”
(YouTube and PDF)

```
Instructions: Click on the lecture “Uses and Abuses of
Non-parametric Statistics” and watch the video. You may also want to
download the presentation (in PDF) used for the lecture. In this
video, John Klein provides an overview of survival analysis,
including Kaplan Meier estimation and competing risk.
Watching this lecture should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Reading: McMaster University: John Fox’s “Survival Analysis”**Link: McMaster University: John Fox’s “Survival Analysis” (PDF)

Instructions: Scroll down to “Lecture Notes and R Scripts” at the bottom of the webpage and download the PDF file for “Survival Analysis.”Reading these lecture notes should take approximately 3 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**Assessment: The Saylor Foundation’s “Survival Analysis”**Link: The Saylor Foundation’s “Survival Analysis” (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Survival Analysis” (PDF) in subunit 6.3.

Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

**Assessment: The Saylor Foundation’s “Unit 6.3 Assessment”**Link: The Saylor Foundation’s “Unit 6.3 Assessment” (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.3 (PDF).Completing this assessment should take you no longer than 2 hours.

**6.4 Multilevel (Hierarchical) Models**
**6.4.1 Introduction to Multilevel Modeling**
- **Reading: Harvey Goldstein’s Multilevel Statistical Models:
“Chapter 1: Introduction”**
Link: Harvey Goldstein’s

*Multilevel Statistical Models*: “Chapter 1: Introduction” (PDF)

Instructions: Click on the link for “Chapter 1: Introduction” and download the PDF file. This chapter will introduce you to multilevel models, which are also known as hierarchical linear models, nested models, mixed models, random coefficient, or random-effects models. These types of models account for variations in parameters at multiple levels.

```
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**6.4.2 A Basic Linear Multilevel Model**
- **Reading: Harvey Goldstein’s Multilevel Statistical Models:
“Chapter 2: The basic linear multilevel model and its estimation”**
Link: Harvey Goldstein’s

*Multilevel Statistical Models*: “Chapter 2: The basic linear multilevel model and its estimation” (PDF)

Instructions: Click on the link and download the PDF file for Chapter 2. This chapter will walk you through a simple two-level linear regression model.

```
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Assessment: The Saylor Foundation’s “Multilevel Modeling”**Link: The Saylor Foundation’s “Multilevel Modeling” (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Multilevel Modeling” (PDF) in subunit 6.5.Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.r-project.org to download and install R on your computer. R will be used throughout the course for assignments.

**Assessment: The Saylor Foundation’s “Meta-Analysis Assessment”**Link: The Saylor Foundation’s “Meta-Analysis Assessment” (PDF)Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.4 (PDF).

Completing this assignment should take you no longer than 2 hours.

**6.5 Longitudinal data analysis**
**6.5.1 Overview of Longitudinal Data Analysis**
- **Reading: Marie Davidian’s “Introduction to Modeling and Analysis
of Longitudinal Data”**
Link: Marie Davidian’s “Introduction to Modeling and Analysis of
Longitudinal
Data”
(PDF)

Instructions: Click on the link for “Introduction to Modeling and
Analysis of Longitudinal Data” (Introductory Lecture Session at the
2006 ENAR Spring Meeting, March 2006) and download the PDF file. The
slide deck will provide an overview of longitudinal data
analysis.

```
Reading this lecture should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**6.5.2 Time-Series Analysis**
- **Reading: Engineering Statistics Handbook: “Section 6.4:
Introduction to Time Series Analysis”**
Link: Engineering Statistics Handbook: “Section 6.4: Introduction
to Time Series
Analysis”
(HTML)

Instructions: Read sections 6.4.1-6.4.3 on the webpage.

```
Reading this section should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Activity: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis”**Link: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis” (PDF)Instructions: Click on the link above and download the PDF file “Introduction to R for Times Series Analysis” in the “R Help” section. This is a tutorial for times series analysis in R. Follow the instructions given in the tutorial. Pay careful attention to Section 3, which will guide you step-by-step on how to perform autoregressive integrated moving average (ARIMA).

Reading this section should take approximately 4 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**6.5.3 Case Study: Nonlinear Mixed Effects Models for Pharmacokinetic
and Pharmacodynamic Analysis**
- **Reading: Marie Davidian’s “An Introduction to Nonlinear Mixed
Effects Models and PK/PD Analysis”**
Link: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects
Models and
PK/PD (PDF)

Instructions: Click on the link for “An Introduction to Nonlinear
Mixed Effects Models and PK/PD Analysis” (ASA Biopharmaceutical
Section webinar, April 2010) and download the PDF file. The slide
deck will introduce you to the use of a multilevel modeling for
analyzing longitudinal data in the areas of pharmacokinetic and
pharmacodynamics. This case study will leverage what you learn in
subunit 6.4 and subunit 6.5.1.

```
Reading this study should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**6.6 Data Mining and Machine Learning**
**6.6.1 Introduction**
- **Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data
Mining: Concepts and Techniques: “Chapter 1: Introduction”**
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s

*Data Mining: Concepts and Techniques*: “Chapter 1: Introduction” (PowerPoint)

Instructions: Click on the link for “Chapter 1: Introduction” and download the slide deck. The slide deck will provide an overview of data mining.

```
Reading through this slide show should take approximately 2
hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Lecture: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1”**Link: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1” (YouTube, iTunes, and MP4)Instructions: Watch the video for Lecture 1. In this video, Andrew Ng discusses the basic concepts and applications of machine learning. You may also download the transcript of the lecture (PDF of HTML). This lecture series is one of the best resources available on the web on machine learning. Consider watching other videos in the lecture series if you want to learn more about machine learning.

Watching this lecture should take approximately 1 hour and 15 minutes.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**6.6.2 Classification: Overview and Basics**
- **Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data
Mining: Concepts and Techniques: “Chapter 8: Classification: Basic
Concepts”**
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s

*Data Mining: Concepts and Techniques*: “Chapter 8: Classification: Basic Concepts” (PowerPoint)

Instructions: Click on the link for “Chapter 8: Classification: Basic Concepts” and download the slide deck. The slide deck will provide an overview of concepts of classification used in data mining, including supervised versus unsupervised learning, classification versus numeric prediction. You will also learn about basic classification techniques such as decision tree induction and Bayes classification.

```
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**6.6.3 Advanced classification methods: Bayesian belief networks,
neural network, Support Vector Machine (SVM)**
- **Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data
Mining: Concepts and Techniques: “Chapter 9: Classification:
Advanced Methods”**
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s

*Data Mining: Concepts and Techniques*: “Chapter 9: Classification: Advanced Methods” (PowerPoint)

Instructions: Click on the link for “Chapter 9: Classification: Advanced Methods” and download the slide deck. The slide deck will provide an overview of advanced statistical methods developed for classification, including neural networks and Support Vector Machine (SVM), k-Nearest Neighboring (kNN) algorithm, and genetic algorithm. Which scenarios are best for training Bayesian networks? What are the strengths and weaknesses of neural network as a classifier? In SVM, what is the marginal hyperplane?

```
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Activity: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R”**Link: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R” (PDF)Instructions: Click on the link above and download the PDF file “svmdoc.pdf”. This is a tutorial for Support Vector Machines package in R. Follow the instructions given in the tutorial closely. You will need to install and load the “e1071” package for this tutorial.

Reading this tutorial should take approximately 3 hours.

Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

**6.6.4 Introduction to Bayesian Inference**
- **Lecture: Cambridge University: Christopher Bishop’s “Introduction
To Bayesian Inference”**
Link: Cambridge University: Christopher Bishop’s “Introduction To
Bayesian Inference”
(Adobe Flash)

```
Instructions: Watch the video for Lecture 1. In this video,
Christopher Bishop provides a brief overview of the past, the
present, and the future of machine. You will also learn about
Bayesian inference, the foundation of the third-generation of
machine learning techniques.
Watching this lecture should take approximately 1 hour and 30
minutes.
Terms of Use: Please respect the copyright and terms of use
displayed on the webpage above.
```

**Assessment: The Saylor Foundation’s “Classification Techniques”**Link: The Saylor Foundation’s “Classification Techniques” (PDF)

Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” (PDF) for subunit 6.6 (which was written in R).Completing this assessment should take you no longer than 4 hours.

**Assessment: The Saylor Foundation’s “Unit 6.6 Assessment”**Link: The Saylor Foundation’s “Unit 6.6 Assessment” (PDF)

Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.6 (PDF).Completing this assignment should take you no longer than 2 hours.

**Unit 6 Assessment**
- **Assessment: The Saylor Foundation’s “Unit 6 Assessment”**
Link: The Saylor Foundation’s “Unit 6
Assessment”

Instructions: Complete this assessment to gauge your understanding
of the materials covered thus far in this course. When you click
“submit,” you will be shown the correct answers.