A few things you should keep in mind when working on assignments:
Make sure you fill in any place that says YOUR CODE HERE
. Do not write your answer in anywhere else other than where it says YOUR CODE HERE
. Anything you write anywhere else will be removed or overwritten by the autograder.
Before you submit your assignment, make sure everything runs as expected. Go to menubar, select Kernel, and restart the kernel and run all cells (Restart & Run all).
Do not change the title (i.e. file name) of this notebook.
Make sure that you save your work (in the menubar, select File → Save and CheckPoint)
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
from sklearn.neighbors import KernelDensity
from nose.tools import assert_equal, assert_is_instance, assert_is_not, assert_almost_equal
The problems will use data from the down jones index.
#Load the data and see what it looks like
df = pd.read_csv('./dow_jones_index.data')
df.head()
Write a function called histogram_plotter that takes in a data frame, a column name from that data frame, and a number of bins and then plots a histogram of the data in that column.
Furthremore:
Set the y axis label to "Counts"
Set the x axis label to the name of the column being plotted
def histogram_plotter(df, column, num_bins):
"""
Input
------
df: a pandas dataframe that contains the column we want to plot
column: a string that is the name of the column to be plotted
num_bins: an integer, the number of bins to use
Output
------
ax: a matplotlib.axes._subplots.AxesSubplot object
"""
### YOUR CODE HERE
return ax
my_plot = histogram_plotter(df, 'open', 20)
assert_equal(my_plot.get_xlabel(), 'open')
assert_is_instance(my_plot,mpl.axes. Axes)
assert_almost_equal(my_plot.get_ylim()[1], 100.8)
assert_equal(len(my_plot.get_xticks()), 11)
assert_equal(my_plot.get_ylabel(), 'Counts')
Write a function called kde_plotter that takes in a data frame, a column name from that data frame, and a number of bins and then plots a histogram along with a kernel density estimate of the data in that column, using seaborn.
Furthremore:
Set the y axis label to "Density"
Set the x axis label to the name of the column being plotted
def kde_plotter(df, column, num_bins):
"""
Input
------
df: a pandas dataframe that contains the column we want to plot
column: a string that is the name of the column to be plotted
num_bins: an integer, the number of bins to use
Output
------
ax: a matplotlib.axes._subplots.AxesSubplot object
"""
### YOUR CODE HERE
return ax
my_kde = kde_plotter(df, 'open', 20)
x, y = my_kde.get_lines()[0].get_data()
assert_almost_equal(0.00159, y[10], places=3)
assert_almost_equal(17.617908, x[20], places=3)
assert_equal(my_kde.get_xlabel(), 'open')
assert_is_instance(my_kde,mpl.axes.Axes)
assert_equal(my_kde.get_ylabel(), 'Density')
For this problem in the mv_kde function create a 2D KDE where the x axis will be percent_change_price and the y axis will be high. Both of this variables are in the dataframe that is passed into to mv_kde.
def mv_kde(df):
'''
df: dataframe with data from dow jones index
returns Jointgrid object
'''
### YOUR CODE HERE
return ax
pcp_h = mv_kde(df)
assert_is_instance(pcp_h, sns.axisgrid.JointGrid , msg='Return JointGridObject, you can do this by using the JoinGrid function in seaborn.')
assert_equal(np.array_equal(pcp_h.x, df.percent_change_price.values), True, msg='Percent change price should used for the x-axis')
assert_equal(np.array_equal(pcp_h.y, df.high.values), True, msg='High should used for the y-axis')
We have taken a subset of the dow jones dataset and stored in a variable called X which is displayed below. Using the data in X we want to generate more stock data by fitting a KDE and sampling from it's distribution.
Your task is to complete the function gen_stock_data. This function takes in X (the data), n_samples (the number of samples to produce), and random_state (which is used to control to control the generator state used for random sampling.)
For this function:
X = df[['open', 'high', 'low', 'close']]
X.head()
def gen_stock_data(X, n_samples=100, random_state=0):
'''
X - dataset containing subset of dowjones
n_samples - integer which tells us how many samples to return
random_state - controls generator state for random sampling
'''
### YOUR CODE HERE
sd1 = gen_stock_data(X, n_samples=1000, random_state=0)
fig, ((ax1_orig, ax2_orig, ax3_orig, ax4_orig),
(ax1_samp ,ax2_samp, ax3_samp, ax4_samp)) = plt.subplots(2, 4, figsize=(10, 5))
ax1_orig.hist(X.open, alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax2_orig.hist(X.high, alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax3_orig.hist(X.low, alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax4_orig.hist(X.close, alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
def column(matrix, i):
return [row[i] for row in matrix]
ax1_samp.hist(column(sd1,0), alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax2_samp.hist(column(sd1,1), alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax3_samp.hist(column(sd1,2), alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
ax4_samp.hist(column(sd1,3), alpha=0.5, color=sns.xkcd_rgb["denim blue"], normed=True, label='')
for i in [ax1_orig, ax2_orig, ax3_orig, ax4_orig, ax1_samp ,ax2_samp, ax3_samp, ax4_samp]:
if i != ax1_orig or i != ax1_samp:
i.set_yticks([])
ax1_orig.set_title('open', fontsize=14)
ax2_orig.set_title('high', fontsize=14)
ax3_orig.set_title('low', fontsize=14)
ax4_orig.set_title('close', fontsize=14)
ax1_orig.set_ylabel('Orignal Data', fontsize=14)
ax1_samp.set_ylabel('Sampled Data', fontsize=14)
from helper import gsd
assert_is_instance(sd1, np.ndarray, msg='Your function does not return a numpy array.')
assert_equal(len(sd1), 1000, msg='Your function should use the n_samples parameter. The array should return 1000 rows it current returns {0}'.format(len(sd1)))
assert_equal(np.array_equal(sd1, gsd(X, n_samples=1000, random_state=0)), True, msg='The generated data does not match the solution')
© 2017: Robert J. Brunner at the University of Illinois.
This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.