Module 7 Assignment¶

A few things you should keep in mind when working on assignments:

Make sure you fill in any place that says YOUR CODE HERE. Do not write your answer in anywhere else other than where it says YOUR CODE HERE. Anything you write anywhere else will be removed or overwritten by the autograder.
Before you submit your assignment, make sure everything runs as expected. Go to menubar, select Kernel, and restart the kernel and run all cells (Restart & Run all).
Do not change the title (i.e. file name) of this notebook.
Make sure that you save your work (in the menubar, select File → Save and CheckPoint)

import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy.stats as st
from nose.tools import assert_equal, assert_almost_equal
from nose.tools import assert_equal, assert_is_instance, assert_is_not
%matplotlib inline

In the following problems we will use the dow jones index data

df = pd.read_csv('dow_jones_index.data')
df.head()

Problem 1: Creating a Scatter Plot¶

Write a function called "scatter_plot" that takes in a stock name, and two other column names from the dow jones index data and plots a scatter plot of the two columns.

For example, the function would be able to take in "AA", "open", and "close" as inputs, and it would plot the scatter plot of "open" and "close" for the "AA" stock.

Furthermore:

Give the x-axis the same label as the column name inputted for "x_data"
Give the y-axis the same label as the column name inputted for "y_data"

def scatter_plot(df,stock_name,x_data,y_data):
    """
    Inputs
    ------
    df: a pandas dataframe, the dataframe containing the relevant data
    
    stock_name: a string, the name of the stock
    
    x_data: a string, the name of the first column to be used
    
    y_data: a string, the name of the second column to be used
    
    Output
    ------
    
    ax: a matplotlib.axes object
    
    """
    
    ### YOUR CODE HERE
    
    return ax

my_plot = scatter_plot(df, 'AA', 'open', 'close')

assert_equal(my_plot.get_xlabel(), 'open')
assert_equal(my_plot.get_ylabel(), 'close')
assert_almost_equal(my_plot.collections[0].get_offsets()[0][0], 15.82)
assert_equal(len(my_plot.collections[0].get_offsets()), 25)

Problem 2: Two Scatter Plots at Once¶

Write a function called "scatter_plot_compare" that takes in two stock names, and two other column names from the dow jones index data and plots a scatter plot of the two columns for each of the two stocks on the same plot.

For example, the function would be able to take in "AA", "BA", "open", and "close" as inputs, and it would plot the scatter plot of "open" and "close" for the "AA" stock and "BA" stock.

Furthermore:

Give the x-axis the same label as the column name inputted for "x_data"
Give the y-axis the same label as the column name inputted for "y_data"
Add a legend so that the datapoints for the first stock have the first stocks name as a label, and the datapoints for the second stock have the second stocks name as a label.

def scatter_plot_compare(df,stock_name1,stock_name2,x_data,y_data):
    """
    Inputs
    ------
    df: a pandas dataframe, the dataframe containing the relevant data
    
    stock_name1: a string, the name of the first stock
    
    stock_name2: a string, the name of the second stock
    
    x_data: a string, the name of the first column to be used
    
    y_data: a string, the name of the second column to be used
    
    Output
    ------
    
    ax: a matplotlib.axes object
    
    """
    
    ### YOUR CODE HERE
    
    return ax

my_plot = scatter_plot_compare(df, 'AA', 'GE', 'open', 'close')

assert_equal(my_plot.get_xlabel(), 'open')
assert_equal(my_plot.get_ylabel(), 'close')
assert_equal(len(my_plot.collections), 2)
assert_equal(my_plot.legend().get_texts()[0].get_text(), 'AA')
assert_equal(my_plot.legend().get_texts()[1].get_text(), 'GE')
assert_equal(len(my_plot.collections[1].get_offsets()), 25)
assert_equal(len(my_plot.collections[0].get_offsets()), 25)

df_mat = df.values  # convert pandas to numpy multi dim array
print(df_mat[:5])

Problem 3: Grabbing Columns from Multidimensional Numpy Array¶

For this problem you will finish writing the function get_column. This function takes in df_mat a multi dimensional numpy array and col the integer index of the column. Your function should return the entire column of the numpy multi dimensional array.

def get_column(df_mat, col):
    '''
    df_mat: multi dimensional array
    
    col: integer index
    
    returns numpy array which is the column col of df_mat
    '''
    
    ### YOUR CODE HERE

Get the First 5 Dates¶

These dates should match what's above

print(get_column(df_mat, 2)[:5])

from helper import gc
assert_equal(np.array_equal(get_column(df_mat,3), gc(df_mat,3)), True)
assert_equal(np.array_equal(get_column(df_mat,0), gc(df_mat,0)), True)
assert_equal(np.array_equal(get_column(df_mat,5), gc(df_mat,5)), True)

Problem 4: Correlation of Columns¶

In this problem you will finish writing the correlation function. The correlation function has the following parameters: df_mat a multidimensional array and col1 and col2 integer indicies used to index df_mat Your task is do the following:

Get col1 and col2 from df_mat.
Plot col1 and col2 from df_mat using the scatter function from pyplot.
- Your plot should have a title and labels for the x and y axis
Compute the Pearson and Spearman correlations of col1 and col2
Lastly it should return:
- Axes object (we have created this for you)
- Pearson correlation
- Spearman correlation

def correlation(df_mat, col1, col2):
    
    fig, ax = plt.subplots(figsize=(10, 5))
    
    ### YOUR CODE HERE

from helper import c

col1, col2 = 6, 7
sol = c(df_mat, col1, col2)

ax, pc, sc = correlation(df_mat, col1, col2)
data = ax.collections[0].get_offsets()
print('Pearson Correlation: {0}'.format(pc[0]))
print('Spearman Correlation: {0}'.format(sc[0]))
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')  
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")

assert_equal(np.array_equal(data[:,0], sol[0]), True, msg="Data on for the x axis is not correct")
assert_equal(np.array_equal(data[:,1], sol[1]), True, msg="Data on for the y axis is not correct")

assert_almost_equal(pc[0], sol[2][0])
assert_almost_equal(sc[0], sol[3][0])

Problem 5: Fitting OLS Model to Data¶

Your task is to finish writing the reg_plot function. Your task is to fit an OLS model to 2 columns of data. reg_plot takes in the following parameters of a dataframe (not numpy multidimensional array), and x and y which are strings that specify the name of a column. Use regplot in seaborn to fit an ols model to the data. Your plot should contain a label for the x and y axis and also a tite.

def reg_plot(df, x, y):
    '''
    df dataframe
    
    x: column name
    
    y: column name
    '''
    
    ### YOUR CODE HERE
    
    return ax

ax = reg_plot(df, x='open', y='close')
from helper import rp
sol_x, sol_y = rp(df)
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')  
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")
assert_equal(np.array_equal(ax.lines[0].get_ydata(), sol_y), True, msg="Data on Y-Axis is incorrect")
assert_equal(np.array_equal(ax.lines[0].get_xdata(), sol_x), True, msg="Data on x-axis is incorrect")

sol_x, sol_y = rp(df, 'high', 'volume')
assert_is_instance(ax, mpl.axes.Axes, msg='Return a Axes object.')  
assert_is_not(len(ax.title.get_text()), 0, msg="Your plot doesn't have a title.")
assert_is_not(ax.xaxis.get_label_text(), '', msg="Change the x-axis label to something more descriptive.")
assert_is_not(ax.yaxis.get_label_text(), '', msg="Change the y-axis label to something more descriptive.")
assert_equal(np.array_equal(ax.lines[0].get_ydata(), sol_y), True, msg="Data on x axis is incorrect")
assert_equal(np.array_equal(ax.lines[0].get_xdata(), sol_x), True, msg="Data on y axis is incorrect")

This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.