Module 8: Introduction to Density Estimation¶

Often, as part of exploratory data analysis, a histogram is used to understand how data are distributed, and in fact this technique can be used to compute a probability mass function (or PMF) from a data set as was shown in an earlier module. However, the binning approach has issues, including a dependance on the number and width of the bins used to compute the histogram. One approach to overcome these issues is to fit a function to the binned data, which is known as parametric estimation. Alternatively, we can construct an approximation to the data by employing a non-parametric density estimation. The most commonly used non-parametric technique is kernel density estimation (or KDE). In this module, you will learn about density estimation and specifically how to employ KDE. One often overlooked aspect of density estimation is the model representation that is generated for the data, which can be used to emulate new data. This concept is demonstrated by applying density estimation to images of handwritten digits, and sampling from the resulting model.

Learning Objectives¶

By the end of this module, you should be able to:¶

understand both parametric and non-parametric density estimation
understand the basic concepts behind kernel density estimation
use density estimation to approximate or smooth discrete data
construct a kernel density estimate by using the Python scikit-learn module, and
sample from a density estimate model to generate new data.

Activities and Assignments¶

Activities and Assignments	Time Estimate	Deadline	Points
Module 8 Overview Video	10 Minutes	N/A	N/A
Module 8 Lesson 1: Why learn Data Analytics?	1 Hour	N/A	N/A
Module 8 Lesson 2: Introduction to Density Estimation	2 Hours	N/A	N/A
Module 8 Lesson 3: Advanced Density Estimation	2 Hours	N/A	N/A
Module 8 Assignment	2 Hours	N/A	N/A

This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.