Open In Colab

Question 1

Question 1a

Write a function summation that evaluates the following summation for $n \geq 1$:

$$\sum_{i=1}^{n} i^3 + 3 i^2$$

1
2
3
4
def summation(n):
    """Compute the summation i^3 + 3 * i^2 for 1 <= i <= n."""
    ...
    return sum(i**3+3*i**2 for i in range(1,n+1))

Question 1b

Write a function elementwise_list_sum that computes the square of each value in list_1, the cube of each value in list_2, then returns a list containing the element-wise sum of these results. Assume that list_1 and list_2 have the same number of elements.

Hint: The zip function may be useful here.

1
2
3
4
5
6
7
def elementwise_list_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2.

    Assume list_1 and list_2 have the same length.
    """
    assert len(list_1) == len(list_2), "both args must have the same number of elements"
    return [x**2+y**3 for x,y in zip(list_1, list_2)]

Question 1c

Recall the formula for population variance below:

$$\sigma^2 = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N}$$

Complete the functions below to compute the population variance of population, an array of numbers. For this question, do not use built in NumPy functions; we will use NumPy to verify your code. Don’t worry if you’re unfamiliar with what NumPy is, we discuss it in the next section.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def mean(population):
    """
    Returns the mean of population (mu)

    Keyword arguments:
    population -- a numpy array of numbers
    """
    # Calculate the mean of a population
    return sum(population)/len(population)

def variance(population):
    """
    Returns the variance of population (sigma squared)

    Keyword arguments:
    population -- a numpy array of numbers
    """
    # Calculate the variance of a population
    return mean([ (i-mean(population))**2 for i in population])

Question 2

The core of NumPy is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.

In Data 8, we used make_array from the datascience module, but that’s not the most typical way. Instead, use np.array to create an array. It takes a sequence, such as a list or range.

Below, create an array arr containing the values 1, 2, 3, 4, and 5 (in that order)

1
arr = np.array([1,2,3,4,5])

Question 3

Question 3a

Given the array random_arr, assign valid_values to an array containing all values $x$ such that $2x^4 > 1$.

Note: You should not use for loops in your solution. Instead, look at numpy’s documentation on Boolean Indexing.

1
2
3
np.random.seed(42)
random_arr = np.random.rand(60)
valid_values = random_arr[2*(random_arr**4)>1]

Question 3b

Use NumPy to recreate your answer to Question 1b. The input parameters will both be python lists, so you will need to convert the lists into arrays before performing your operations. The output should be a numpy array.

Hint: Use the NumPy documentation. If you’re stuck, try a search engine! Searching the web for examples of how to use modules is very common in data science.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10

def elementwise_array_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2.

    Assume list_1 and list_2 have the same length.

    Return a NumPy array.
    """
    assert len(list_1) == len(list_2), "both args must have the same number of elements"
    return np.array([x**2+y**3 for x, y in zip(list_1, list_2)])

With the larger dataset, we see that using NumPy results in code that executes over 50 times faster! Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly

Question 4

Consider the function $f(x) = x^2$ for $-\infty < x < \infty$.

Question 4c

Write code to plot the function $f$, the tangent line at $x=8$, and the tangent line at $x=0$.

Set the range of the x-axis to (-15, 15) and the range of the y-axis to (-100, 300) and the figure size to (4,4).

Your resulting plot should look like this:

pic

You should use the plt.plot function to plot lines. You may find the following functions useful:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def f(x):
    return x**2

def df(x):
    return 2*x

def plot(f, df):
    plt.style.use('fivethirtyeight')
    plt.figure(figsize=(4,4))
    plt.ylim(-100, 300)
    x = np.linspace(-15, 15, 100)
    plt.plot(x, f(x),color='#008fd5')
    plt.axhline(y= df(0), color='#fc4f30')
    plt.axline((8,64),slope=df(8), color='#e5af3d')
    # plt.axis('off')
    # plt.grid(True)
    plt.show()


plot(f, df)

Result: question4

Question 5

Data scientists use coin tossing as a visual image for sampling at random with replacement from a binary population.

Question 5a

A coin that lands heads with chance 0.8 is tossed six times. What is the chance of the sequence HHHTHT? Assign your answer to the variable p_HHHTHT.

1
2
p_HHHTHT = (0.8 ** 4) * (0.2 ** 2)
p_HHHTHT

Question 5b

I have a coin that lands heads with an unknown probability $p$. I toss it 10 times and get the sequence TTTHTHHTTH.

If you toss this coin 10 times, the chance that you get the sequence above is a function of $p$. That function is called the likelihood of the sequence TTTHTHHTTH, so we will call it $l$.

Below is the graph of $l$ as a function of $p$ for $p \in [0, 1]$. As we see in the graph, the likelihood of observing TTTHTHHTTH varies as we change the value of $p$. Certain values of $p$ make the sequence more likely than others.

1
2
3
4
5
6
7
p = np.linspace(0, 1, 100)
likelihood = (p**4) * ((1-p)**6)
plt.plot(p, likelihood, lw=2, color='darkblue') # lw is line width
plt.plot([0, 1], [0, 0], lw=1, color='grey')    # horizontal axis
plt.xlabel('$p$')
plt.ylabel('$l(p)$', rotation=0)
plt.title('Likelihood of TTTHTHHTTH')

plot

Question 6

Data science is a rapidly expanding field and no degree program can hope to teach you everything that will be helpful to you as a data scientist. So it’s important that you become familiar with looking up documentation and learning how to read it.

Below is a section of code that plots a three-dimensional “wireframe” plot. You’ll see what that means when you draw it. Replace each # Your answer here with a description of what the line above does, what the arguments being passed in are, and how the arguments are used in the function. For example,

1
2
np.arange(2, 5, 0.2)
# This returns an array of numbers from 2 to 5 with an interval size of 0.2

Hint: The Shift + Tab tip from earlier in the notebook may help here. Remember that objects must be defined in order for the documentation shortcut to work; for example, all of the documentation will show for method calls from np since we’ve already executed import numpy as np. However, since z is not yet defined in the kernel, z.reshape() will not show documentation until you run the line z = np.cos(squared).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from mpl_toolkits.mplot3d import axes3d

u = np.linspace(1.5*np.pi, -1.5*np.pi, 100)
# This returns a array from pi*1.5 to with an interval size of 100
[x,y] = np.meshgrid(u, u)
#
squared = np.sqrt(x.flatten()**2 + y.flatten()**2)
z = np.cos(squared)
# cos funtion for squared
z = z.reshape(x.shape)
# keep the shape with x

fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111, projection='3d')
# add a plot
ax.plot_wireframe(x, y, z, rstride=5, cstride=5, lw=2)
# set the 3d frame high width length
ax.view_init(elev=60, azim=25)
# show the plot
plt.savefig("figure1.png")
# store the plot

figure1

Question 8 (ungraded)

This problem will not be graded, but you should still attempt it! Suppose we want to visualize the function $g(t) = a \cdot \sin(2 \pi f t)$ while varying the values $f, a$. Generate a 2 by 2 plot that plots the function $g(t)$ as a line plot with values $f = 2, 8$ and $a = 2, 8$. Since there are 2 values of $f$ and 2 values of $a$ there are a total of 4 combinations, hence a 2 by 2 plot. The rows should vary in $f$ and the columns should vary in $a$.

Set the x limit of all figures to $[0, \pi]$ and the y limit to $[-10, 10]$. The figure size should be 8 by 8. Make sure to label your x and y axes with the appropriate value of $f$ or $a$. Additionally, make sure the x ticks are labeled $[0, \frac{\pi}{2}, \pi]$. Your overall plot should look something like this:

2by2

Hint 1: Modularize your code and use loops.

Hint 2: Are your plots too close together such that the labels are overlapping with other plots? Look at the plt.subplots_adjust function.

Hint 3: Having trouble setting the x-axis ticks and ticklabels? Look at the plt.xticks function.

Hint 4: You can add title to overall plot with plt.suptitle.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, np.pi, 10000)
fig, axs = plt.subplots(2, 2, figsize=(6, 6))
plt.style.use("fivethirtyeight")
# plt.rc('text', usetex = True)
f = [2, 8]
a = [2, 8]
for index_i, i in enumerate(f):
    for index_j, j in enumerate(a):
        axs[index_i, index_j].plot(x, j * np.sin(2 * np.pi * i * x), color="#008fd5")
        axs[index_i, index_j].set_xlabel("a:{}".format(j))
        axs[index_i, index_j].set_ylabel("f:{}".format(i))
        axs[index_i, index_j].set_xticks((0, np.pi / 2, np.pi))
        axs[index_i, index_j].set_xticklabels(["0", r"${\pi}/{2}$", r"$\pi$"])
        axs[index_i, index_j].set_yticks(range(-10, 11, 5))
        axs[index_i, index_j].set_ylim(-10, 10)
plt.xticks([0, np.pi / 2, np.pi])
plt.subplots_adjust(right=1.2, left=-0.3, bottom=-0.1)
plt.suptitle("Sine waves with varying a=[2,8],f=[2,8]", ha="center")
plt.show()

question8