A few things you should keep in mind when working on assignments:
Make sure you fill in any place that says YOUR CODE HERE
. Do not write your answer in anywhere else other than where it says YOUR CODE HERE
. Anything you write anywhere else will be removed or overwritten by the autograder.
Before you submit your assignment, make sure everything runs as expected. Go to menubar, select Kernel, and restart the kernel and run all cells (Restart & Run all).
Do not change the title (i.e. file name) of this notebook.
Make sure that you save your work (in the menubar, select File → Save and CheckPoint)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy.stats as st
from nose.tools import assert_equal, assert_almost_equal
from nose.tools import assert_equal, assert_is_instance, assert_is_not
%matplotlib inline
In the following problems we will use the dow jones index data
df = pd.read_csv('dow_jones_index.data')
df.head()
Write a function called "scatter_plot" that takes in a stock name, and two other column names from the dow jones index data and plots a scatter plot of the two columns.
For example, the function would be able to take in "AA", "open", and "close" as inputs, and it would plot the scatter plot of "open" and "close" for the "AA" stock.
Furthermore:
Give the x-axis the same label as the column name inputted for "x_data"
Give the y-axis the same label as the column name inputted for "y_data"
def scatter_plot(df,stock_name,x_data,y_data):
"""
Inputs
------
df: a pandas dataframe, the dataframe containing the relevant data
stock_name: a string, the name of the stock
x_data: a string, the name of the first column to be used
y_data: a string, the name of the second column to be used
Output
------
ax: a matplotlib.axes object
"""
### YOUR CODE HERE
return ax
my_plot = scatter_plot(df, 'AA', 'open', 'close')
assert_equal(my_plot.get_xlabel(), 'open')
assert_equal(my_plot.get_ylabel(), 'close')
assert_almost_equal(my_plot.collections[0].get_offsets()[0][0], 15.82)
assert_equal(len(my_plot.collections[0].get_offsets()), 25)
Write a function called "scatter_plot_compare" that takes in two stock names, and two other column names from the dow jones index data and plots a scatter plot of the two columns for each of the two stocks on the same plot.
For example, the function would be able to take in "AA", "BA", "open", and "close" as inputs, and it would plot the scatter plot of "open" and "close" for the "AA" stock and "BA" stock.
Furthermore:
Give the x-axis the same label as the column name inputted for "x_data"
Give the y-axis the same label as the column name inputted for "y_data"
Add a legend so that the datapoints for the first stock have the first stocks name as a label, and the datapoints for the second stock have the second stocks name as a label.
def scatter_plot_compare(df,stock_name1,stock_name2,x_data,y_data):
"""
Inputs
------
df: a pandas dataframe, the dataframe containing the relevant data
stock_name1: a string, the name of the first stock
stock_name2: a string, the name of the second stock
x_data: a string, the name of the first column to be used
y_data: a string, the name of the second column to be used
Output
------
ax: a matplotlib.axes object
"""
### YOUR CODE HERE
return ax
my_plot = scatter_plot_compare(df, 'AA', 'GE', 'open', 'close')
assert_equal(my_plot.get_xlabel(), 'open')
assert_equal(my_plot.get_ylabel(), 'close')
assert_equal(len(my_plot.collections), 2)
assert_equal(my_plot.legend().get_texts()[0].get_text(), 'AA')
assert_equal(my_plot.legend().get_texts()[1].get_text(), 'GE')
assert_equal(len(my_plot.collections[1].get_offsets()), 25)
assert_equal(len(my_plot.collections[0].get_offsets()), 25)
df_mat = df.values # convert pandas to numpy multi dim array
print(df_mat[:5])
For this problem you will finish writing the function get_column. This function takes in df_mat a multi dimensional numpy array and col the integer index of the column. Your function should return the entire column of the numpy multi dimensional array.
def get_column(df_mat, col):
'''
df_mat: multi dimensional array
col: integer index
returns numpy array which is the column col of df_mat
'''
### YOUR CODE HERE
These dates should match what's above
print(get_column(df_mat, 2)[:5])
from helper import gc
assert_equal(np.array_equal(get_column(df_mat,3), gc(df_mat,3)), True)
assert_equal(np.array_equal(get_column(df_mat,0), gc(df_mat,0)), True)
assert_equal(np.array_equal(get_column(df_mat,5), gc(df_mat,5)), True)
In this problem you will finish writing the correlation function. The correlation function has the following parameters: df_mat a multidimensional array and col1 and col2 integer indicies used to index df_mat Your task is do the following:
def correlation(df_mat, col1, col2):
fig, ax = plt.subplots(figsize=(10, 5))
### YOUR CODE HERE
from helper import c
col1, col2 = 6, 7
sol = c(df_mat, col1, col2)
ax, pc, sc = correlation(df_mat, col1, col2)
data = ax.collections[0].get_offsets()
print('Pearson Correlation: {0}'.format(pc[0]))
print('Spearman Correlation: {0}'.format(sc[0]))
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")
assert_equal(np.array_equal(data[:,0], sol[0]), True, msg="Data on for the x axis is not correct")
assert_equal(np.array_equal(data[:,1], sol[1]), True, msg="Data on for the y axis is not correct")
assert_almost_equal(pc[0], sol[2][0])
assert_almost_equal(sc[0], sol[3][0])
Your task is to finish writing the reg_plot function. Your task is to fit an OLS model to 2 columns of data. reg_plot takes in the following parameters of a dataframe (not numpy multidimensional array), and x and y which are strings that specify the name of a column. Use regplot in seaborn to fit an ols model to the data. Your plot should contain a label for the x and y axis and also a tite.
def reg_plot(df, x, y):
'''
df dataframe
x: column name
y: column name
'''
### YOUR CODE HERE
return ax
ax = reg_plot(df, x='open', y='close')
from helper import rp
sol_x, sol_y = rp(df)
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")
assert_equal(np.array_equal(ax.lines[0].get_ydata(), sol_y), True, msg="Data on Y-Axis is incorrect")
assert_equal(np.array_equal(ax.lines[0].get_xdata(), sol_x), True, msg="Data on x-axis is incorrect")
sol_x, sol_y = rp(df, 'high', 'volume')
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")
assert_equal(np.array_equal(ax.lines[0].get_ydata(), sol_y), True, msg="Data on x axis is incorrect")
assert_equal(np.array_equal(ax.lines[0].get_xdata(), sol_x), True, msg="Data on y axis is incorrect")
© 2017: Robert J. Brunner at the University of Illinois.
This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.