Module 4: Statistical Data Analysis

This module introduces fundamental concepts in data analysis. First, you will read about how to perform many basic tasks in Excel by using the Pandas module in Python. Second, you will learn about the NumPy module, which provides support for fast numerical operations within Python. This module will focus on using NumPy with one-dimensional data (i.e., vectors or 1-D arrays), but a later module will explore using NumPy for higher-dimensional data. Third, you will learn about descriptive statistics, which can be used to characterize a data set by using a few specific measurements. Finally, you will learn about advanced functionality within the Pandas module including masking, grouping, stacking, and pivot tables.

Learning Objectives

By the end of this module, you should be able to:

  • understand how to move from analyzing data in Excel to Pandas,
  • work with one dimensional numerical data by using the NumPy module,
  • compute and interpret descriptive statistics, and
  • apply advanced features to more effectively analyze data by using a Pandas DataFrame.

Activities and Assignments

Activities and Assignments Time Estimate Deadline Points
Module 4 Overview Video 10 Minutes N/A N/A
Module 4 Lesson 1: Excel in Python 1 Hour N/A N/A
Module 4 Lesson 2: Introduction to NumPy 2 Hours N/A N/A
Module 4 Lesson 3: Introduction to Descriptive Statistics 2 Hours N/A N/A
Module 4 Lesson 4: Advanced Pandas 2 Hours N/A N/A
Module 4 Assignment 1 Hour N/A N/A

© 2017: Robert J. Brunner at the University of Illinois.

This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.