Downloading Datasets

Some of our students desire to download notebooks and datasets and work on your personal computers. We do not recommend your personal computers for completing your normal course work. The JupyterHub environment and our containers have been tested to work with the course materials, and provide a standardized environment. Our Faculty, TAs, and Staff have access to your JuptyerHub servers to debug issues. We do not have the staff resources to help students debug their local installations, which will vary considerably according to your equipment, OS, and your versions of Python, R, Jupyter, and additional library installations.

However, if you do choose to do extra experimentation on your local computer, there are a couple of options for downloading the datasets. Please be aware that this is not something we provide TA or instructor support for.

WARNING: Do not try to download the entire dataset folder - it is 343 GB. Instead, download individual datasets as desired for your current work.

Method 1: Copy and download
One way that is probably easiest for newbies is to open a terminal and execute a copy command, and then use the Jupyter interface to download the dataset.  The copy command follows this pattern:
cp <path to the dataset> <path to the copy>
For instance, to copy the baby names dataset:
cp /dsa/data/all_datasets/baby-names/NationalNames1.csv NationalNames1.csv
Assuming you are in the root Jupyter directory when executing that command, you would find NationalNames1.csv in your root directory, and would be able to select it using the checkbox next to it and using the download button at the top. After downloading the dataset to your personal computer, remove the dataset from your Jupyter directory using the following command:
rm NationalNames1.csv

Method 2: Terminal commands or client application
You may also use the scp, sftp, or rsync protocols to download from the /dsa/data/all_dataset directory on lz.dsa.missouri.edu, using either the command line on your own machine or a client application.  I will provide an example using the Winscp client on a Windows machine, which is what I personally use.

  • Download and install Winscp: https://winscp.net/eng/index.php
  • Download and install the university VPN AnyConnect:
    Go to https://anyconnect.missouri.edu and choose the Generic group. Log in to the self-service portal with your university user id and password. If you need additional instructions on downloading the VPN client, go to https://doitservices.missouri.edu and search for VPN.
  • Connect to the AnyConnect VPN:
    Connect to the AnyConnect VPN using Windows
    Connect to the AnyConnect VPN using an iOS Device
    Connect to the AnyConnect VPN using Mac
  • In Winscp, set up a session using the New Site option for the lz.dsa.missouri.edu server using the sftp protocol, port 22, and your university pawprint and password and save it for future use giving the session a name of your choosing. Highlight that named session and click the login button to connect using that session. In the left-hand panel, navigate to the folder location on your machine where you wish to store your dataset. In the right-hand panel, there is a drop down box above the visible folders and files that lets you easily navigate to the correct folder. Navigate to root/dsa/data/all_datasets and then to the actual folder location of the dataset you wish to download (for instance, /dsa/data/all_datasets/baby-names/ for the baby names dataset.) You can drag and drop the dataset from right to left.