Module 8: Introduction to Density Estimation

Often, as part of exploratory data analysis, a histogram is used to understand how data are distributed, and in fact this technique can be used to compute a probability mass function (or PMF) from a data set as was shown in an earlier module. However, the binning approach has issues, including a dependance on the number and width of the bins used to compute the histogram. One approach to overcome these issues is to fit a function to the binned data, which is known as parametric estimation. Alternatively, we can construct an approximation to the data by employing a non-parametric density estimation. The most commonly used non-parametric technique is kernel density estimation (or KDE). In this module, you will learn about density estimation and specifically how to employ KDE. One often overlooked aspect of density estimation is the model representation that is generated for the data, which can be used to emulate new data. This concept is demonstrated by applying density estimation to images of handwritten digits, and sampling from the resulting model.

Learning Objectives

By the end of this module, you should be able to:

  • understand both parametric and non-parametric density estimation
  • understand the basic concepts behind kernel density estimation
  • use density estimation to approximate or smooth discrete data
  • construct a kernel density estimate by using the Python scikit-learn module, and
  • sample from a density estimate model to generate new data.

Activities and Assignments

Activities and Assignments Time Estimate Deadline Points
Module 8 Overview Video 10 Minutes N/A N/A
Module 8 Lesson 1: Why learn Data Analytics? 1 Hour N/A N/A
Module 8 Lesson 2: Introduction to Density Estimation 2 Hours N/A N/A
Module 8 Lesson 3: Advanced Density Estimation 2 Hours N/A N/A
Module 8 Assignment 2 Hours N/A N/A

© 2017: Robert J. Brunner at the University of Illinois.

This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.