JSC270, Winter 2020 - Prof. Chevalier

Laboratory 1.0 - "Hello JSC270"

Pre-lab assignment

Due date: January 8, 2020 at 12:00pm (noon)

This assignment is individual.

We will be using Jupyter Notebook together with Github throughout the course.

For this pre-lab assignment, you are required 1) to enter your answers directly in this notebook using Python and Markdown, 2) submit your work by making a commit of your files on Github Classroom, and uploading a pdf of your notebook on Quercus. The following instructions walk you through the whole process, from setting up your workspace to submitting your work.



Objectives

Set up your workspace; familiarize yourself with jupyter notebook; perform some initial exploration of data before the lab; formulate research questions.

Instructions

  • Make a copy of this notebook (see details below).
  • Answer the questions in the cells that indicate where your answers should be placed. You may remove the initial comment in the code cell. For example, for Question III.1, edit the content of the markdown cell with comment `Enter your answers in this cell. You may remove this text`.
  • Make sure that you explain your solutions when asked and comment your code for readability.
  • Commit and push changes to the Github classroom repository that has been provided (more details below).
  • Submit a pdf report to Quercus (more details below).


I. Set up your workspace

  1. If you don't already have one, create an account on Github
  2. Follow the instructions in this git basics walkthrough:
    • Make a copy of this notebook to your github account using this Github Classroom link: https://classroom.github.com/a/Wwxq8088.
    • Launch jupyter using the following command line from your terminal. This should automatically open a jupyter navigator in your web browser:
       jupyter notebook
    • Open the notebook (.ipynb file) in your browser. You will be able to edit the notebook directly from your browser.

II. Quick overview of a Jupyter notebook

J
S
C
2
7
0

This is an example of a markdown cell. Double-click on the cell to see the markdown.

Header 1

Header 2

Some in line $\LaTeX$: $\alpha = 0.05, \beta = 0.2 \Rightarrow \alpha+\beta=0.25.$ If $\LaTeX$ is enclosed between $$ \int_{-\infty}^{\infty} \exp({-x^2/2})dx = \sqrt{2 \pi} $$ then it's displayed on it's own line.

$$ \int_{-\infty}^{\infty} \exp({-x^2/2})dx = \sqrt{2 \pi}$$

Some text in bold.

some text indented

Markdown cells can display HTML.

<h3> Header 3 </h3>
<p> This is a paragraph using HTML. <br>
    <mark> This is important so it's marked in yellow. </mark>  </p>

Header 3

This is a paragraph using HTML.
This is important so it's marked in yellow.

Images can be displayed using markdown or html.

To display a picture of the Toronto skyline (Photo by Berkay Gumustekin on Unsplash):

![](toronto.jpg)

NB: Markdown has no syntax for specifying the dimensions of an image; if this is important to you, you can simply use regular HTML <img> tags.

or HTML code:

<img src='toronto.jpg' style="width:400px; height:400px;">




III. A little bit about yourself...

Answer the following questions by entering your answers in the corresponding cells.

Question 1: Fill out the information below.

Enter your answers in this cell. You may remove this text

  • First (official) name:
  • Last name:
  • Preferred name:
  • Student number:
  • UTORID:
  • Github ID:
  • Portrait photo: <insert here a headshot where you are clearly recognizable>

Question 2: Explain, in a few sentences, what you expect from this course.

Enter your answer in this cell. You may remove this text

Question 3: A professional data scientist is expected to be strong in the following skills. You are just starting to learn this discipline, and so it is perfectly normal that you are not yet a master at all (or rather any!) of these skills. Take a moment to think about your goals for the term. What are the skills that you feel you need most development in? Indicate with a "X" mark in the table, your level in each skill using the following rates:

  1. Not my strength. I need to practice this skill.
  2. I am generally ok, but want to develop this skill further.
  3. I am quite good at this skill, and plan to keep it up.
  4. I am a master, and can help others further develop this skill.

Add a "X" mark in the corresponding column, for each row in this cell. You may remove this text

Skill Level 1 Level 2 Level 3 Level 4
Programming (python / R)
Statistical analysis
Writing (technical reports)
Data Visualization
Public speaking
Team work




IV. Pre-Lab Exercice: Toronto Bike Share

We will study the Toronto Bike Share data in this first laboratory. These questions will get you started with some Python basics and data manipulations.


Download the 2017 data set from the Toronto Bike Share website. The data is stored in a comma-separated value (csv) file, one for each quarter of the year.

Here is an example of a basic .csv file:

1/2/2014,5,8,red
1/3/2014,5,2,green
1/4/2014,9,1,blue
data.csv (END)

There are several options to choose from to load data from a csv file into a Python data structure. We list two common approaches here:

Option 1: Use the python csv library.
See the documentation on how to read and write csv files using this library here. A starter code is provided below.

1/2/2014 red
1/3/2014 green
1/4/2014 blue

In the above example, we read data from the data.csv file using the python csv library, and display the first and last indexes of each row.

Option 2: Another option (recommended for this lab, and beyond) is to use the built-in csv reader of pandas (see pandas documentation and pandas cheat sheet).


Question 1. Read the Bikeshare Ridership (2017 Q1).csv. Display the name of all of the columns.

Question 2. Display the number of rows in the dataset.

Question 3. Display the first few row of the dataset.

Question 4. Display all of the rows where the travel time less than two minutes.

Question 5. Calculate and display the min, max and average trip duration values for this quarter.

Question 6. Formulate three research questions that would be interesting to investigate about Toronto Bike Share usage. Explain why each question is an interesting one to explore.

Write your answer in this cell. You may remove this text.



Submission and Grading

Submission

Complete all of the following tasks to submit your work.

Github Classroom

  1. Use nbconvert to save this Jupyter notebook as an html document without code cells. The command line syntax is:
    jupyter nbconvert --TemplateExporter.exclude_input=True myfirstnb.ipynb
  1. Commit your work to the Github repository using the git commands as follows. See also the git basics
    git add [all files that you want to update/add to the repository]
    git commit -m "[a meaningful comment about your commit, e.g. final submission]"
    git push

Important note: Commit and push both of your .ipyn and .html files to your repository, as well as all of the files (e.g. your headshot photo, the dataset you downloaded) that are necessary for re-executing your notebook from a clone of your repository.

  1. Go to your Github repository online to verify that your push is effective. Make sure that all files are properly updated in your repository.

Quercus

  1. Create a pdf of your notebook including the code cells (print this notebook to pdf), and upload the pdf to Quercus in the corresponding "L1.0" assignment.

Grading

The following grading scheme will be used for this assignment.

Note that marks will be deduced for the following reasons: notebook doesn't compile; files are missing; instructions are not followed.

Marks
Questions III.1-3 All questions are appropriately answered 1
Questions IV.1-5 All questions are correctly answered
(1pt per question)
5
Questrion IV.6 Research questions are clear and interesting
Explanations are clear and thoughtful
9
Total 15