So you want to set up Jupyter Notebooks and Run Python in the Cloud, then actually get the Data?

This is a comprehensive set of steps. In part 1, we get an account set up and reserve a Virtual Machine using Google’s Compute Engine. In Part 2, we install and run a python3 Jupyter Notebook. In Part 3, we set up a Google Cloud Storage bucket to simplify the process of transferring files to and from the VM.

First, a quick disclaimer - I’m not an expert on Ubuntu, and can’t claim to know how to do this all more securely than not handing out a password. For the time being, its best not to share any of your specifics here with anyone.
Second, this write-up assumes you have basic familiarity with command line navigation. If not, here is a quick primer:

Move into a subfolder from current path: cd./[folder name]
Move up a folder level: cd ../
Home directory: ~
Create a subfolder in the current working directory: mkdir [folder name]
Create a text file in the current directory: touch [filename].txt
List contents of current directory (subfolders and files): ls
Display full path of current working directory: pwd

Finally, note that with this SSH instance, any time you highlight anything it will be automatically copied. Be careful about pasting stuff in without first de-selecting what’s in the window.

Step 1: The Setup

Step 1a: Create a project

Search Google Cloud Compute Engine, which should take you to a page with a console link

In the upper left, there should be an option to start a new project. If you’ve used anything with an API in the past, you can see it there as well. Create a new project and write down the name that google assigns it (You can see it from the projects drop down later if you forget) If you haven’t already, you will need to attach a payment account, however Google gives you $300 over 365 days for free.
Important: Stop your instance when you’re done to save money! If you select static instead of Ephermeral storage (below), it will charge you a little more but all your data will stay saved when you close the VM down. CPUs are more expensive by orders of magnitude (while they're actually running), so this is more effective in the long run.

Step 1b: Creating a VM

View this video if you can. The setup is only for python2.7 on an older Ubuntu build, but I've noted the differences below to get python3 up instead. It is otherwise a fantastic walkthrough.

Here is a summary of the salient points:

Select a region - video used us-west 1, zone 1b, but I used us-east1-b and it was fine
Select machine type - I used 8 vCPUs, but this can be changed later
Boot disk -
1. Video used Ubuntu 16.04 LTS xenial, I used 18.04 cosmic (I recommend the later for python3 compatability)
2. At the bottom of this page, you can select more than 10 GB persistent if you think it will be necessary, but this could incur additional charges.
Firewall

Check allow HTTP and HTTPS traffic
click show more, then Disks, then uncheck “delete boot disk when instance is deleted”

Continue - this may take a moment
Once it is created, copy down the External IP on the instance (something like ###.###.###.###)

Important! When you are done with the session, check the instance and click Stop
You’re static storage will be saved, so no worries there

At this point the VM is established, but we need some additional configuration

Click on the horizontal bars in the upper left
VPC Network -> VPC Network -> External IP addresses
On the instance, change Type Ephemeral to Static, and record the name (I used blogexample)
From the left bar, Firewall rules, then from the top create firewall rule

Give it a name, and record: (e.g. myfirewallrule)
Change Targets to from Specified Target Tags to All instances in the network
Source IP Ranges: 0.0.0.0/0
Under Specified Protocols and Ports check "TCP" and give it a number (I used 5000. If you're following along with the video, he first uses 1000 which is reserved and will not work)

From the horizontal bars, head back to Compute Engine - VM Instance

Step 2 Configuring the VM

Step 2A: Installing python3, pip3, and jupyter notebooks

Skip to the summary below for a list of these steps If you are following along with the video, installing python3 instead of python2 is the main point of divergence here To save some space, anywhere if it asks you to continue, enter ‘y’. Anytime you are given two options, enter [1]

From the VM instance screen, click SSH
Record user name and instance, for example username@instance-1. This will be needed for copying files
In order:

sudo apt-get update
sudo apt-get --assume-yes upgrade (keep local version)
sudo apt-get --assume-yes install software-properties-common
sudo apt-get install python-setuptools python-dev build-essential
From the newer Ubuntu: sudo apt install python3-pip
For the example in the video: sudo easy_install pip
From my example: sudo pip3 install jupyter
From older (video example): sudo pip install jupyter
jupyter notebook --generate-config
sudo nano ~/.jupyter/jupyter_notebook_config.py

Press down arrow to navigate just under # Configuration file
Immediately under # Configuration file for jupyter-notebook, paste in the following block (press down arrow to navigate):

c = get_config()
c.NotebookApp.ip= '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 5000

Exit back: Press Ctrl + o (letter o), then Enter. Then, Press Ctrl + x

Enter: jupyter notebook password, then type your password. The cursor will not move, however copy+paste works fine

Step 2B: Running Jupyter Notebook

These steps are stand alone from above, and are all that you need to do to run Jupyter notebooks. Once set up, this can be done from any point in the command line. The first couple steps deal with setting up subfolders to work in, which I like to do to avoid writing things in root. You don’t have to, but the rest of the cloud steps assume you have these subfolders set up.

Note that in the browser window, Jupyter will launch from whatever directory you’re in when you type the open command. From the SSH connection, do the following

To set up project paths:

from command line: mkdir projects
cd ./projects

Launch Jupyter notebook:

jupyter notebook --ip=0.0.0.0 --port=5000 (whatever port you put)
Go to IP:port (e.g. 123.456.789:5000) in a new browser tab
Enter password and run Jupyter

Intalling packages

pip3 install can be run directly using the second window if you have tmux open
Or, from Jupyter notebook you can use !sudo pip3 install pandas (or whatever) and it will work fine
Note: There is a difference. With my installation (aka python3), you will need to use pip3, which otherwise works the same

Closing Jupyter

Quit as normal
From command line, Ctrl + C, then y
Close everything by typing exit

At this point, you are all set up, but of course you want to actually use data from somewhere.

Note: When setting up a new notebook, you may need to use !pip3 install instead of !pip install

Step 3: Setting Up Cloud Storage as an intermediary

Now the good stuff. By ‘logging in’ to Google’s Cloud utility (gsutil), we can access files in Cloud storage buckets. This makes it much easier to get files onto your VM. Naturally, you’ll want to pull them back down when you’re finished.

Step 3a: Set up a Google Cloud Storage Bucket

From The Google Cloud Platform console, click the horizontal bars and go to Storage, then Browser (Make sure you are still in the same project you used to create the VM)
Create bucket and record the name. In this example, I use sample-blog-bucket
Record Name, and use Regional configuration
Add some data and a directory

Upload files by clicking button or dragging
For this example, I’m creating a subfolder called data
Clicking on data, I’m uploading a sample file called description_wordvec.csv

Step 3B: Allowing Communication within project VM and Storage by Establishing API Key

This is the crucial step!

From the main menu, go to IAM & admin -> Service Accounts
From actions on the right, select Crete key, and use the default JSON
Save this file somewhere secure!
You will not need the file in this configuration

Step 3C: Telling your VM how to find it

Back to Compute Engine VM Instances
SSH to your instance
Type: gcloud init

Choice: 1
Project ID: use project ID from above (e.g. blogexample)
Press enter one more time

Secure login to account by typing and entering: gcloud auth login
Do you want to continue: y
Copy full html string and login with active account, allow
Copy generated string back into SSH window (make sure you don’t still have the HTTP copied in the window at this point or the command line app will automatically copy it)

Step 3D: Moving Data Between Cloud Storage and VM

Once logged in with gcloud auth login from above step, type “gsutil cp [filepath from] [filepath to]” in the SSH command line to copy and paste files. Essentially, your bucket it at a location called “gs://”.

Copy from bucket (to a folder with the VM path ~/projects/):

General string looks like this: gsutil cp gs://[bucket-name][filepath] [VM path to copy to]
Single file from bucket to projects folder: gsutil cp gs://sample-blog-bucket/data/description_wordvec.csv ~/projects/
Careful with the spaces
Whole directory: gsutil cp -r gs://sample-blog-bucket/data ~/projects/

Copy to bucket

You can create a file as normal from Jupyter, but for this example I’m creating a folder under projects called dl_files, and adding two files (test1.txt and test2.txt) for the example
General String looks like this: gsutil cp [file] gs://[bucket-name][filepath]
Single file: gsutil cp ~/projects/dl_files/test1.txt gs://sample-blog-bucket/data/
Whole directory: gsutil cp -r ~/projects/dl_files/* gs://sample-blog-bucket/data/
Contents of directory only: gsutil cp -r ~/projects/dl_files gs://sample-blog-bucket/data/

Some more commands are here

Step 3E: Copying Data from bucket

Right click on a file name, then save link as to download :)

On Pickles...

This seems like a good place, so I’ll include it here. Often, the biggest memory hog for data science is a trained model that we want to store.

Using the above steps, we can also easily store a model trained on a VM back down to our local drive for export to Kaggle kernels and the like.

Assuming you have used sklearn to train a model called “my_model”,
First: import pickle
Save the model: pickle.dump(model, open(‘ultimate_model.sav’, ‘wb’))
Open it back up later: my_model = pickle.load(open(‘ultimate_model.sav’, ‘rb’))
As usual, your filename will include the directory path where you would like it to be located.

Summary

Configuring VM code

Copy and paste your specifics here to help follow along:
[firewall]
0.0.0.0/0
[port number]
external IP: [IP number]

user_name@instance-1

sudo apt-get update
sudo apt-get –assume-yes upgrade
sudo apt-get –assume-yes install software-properties-common
sudo apt-get install python-setuptools python-dev build-essential
[y]
sudo apt install python3-pip
sudo pip3 install jupyter
jupyter notebook –generate-config

sudo nano ~/.jupyter/jupyter_notebook_config.py

c = get_config()
c.NotebookApp.ip= ‘*‘
c.NotebookApp.open_browser = False
c.NotebookApp.port = 5000
jupyter notebook password
ctrl+o [enter]
ctrl+x

password:

jupyter notebook –ip=0.0.0.0 –port=5000

Per Session

What you actually need to do to run Jupyter once set up:

Click ssh, run jupyter notebook --ip=0.0.0.0 --port=[your port number]
In a browser window, go to [VM IP Address]:[Port Number], enter your password
If desired, copy files to and from the cloud bucket with gsutil
If desired, install new Python3 packages using !pip3 install from inside a Jupyter Notebook

What you don’t need to do:

Run config files
Re-authorize your VM using gcloud auth login
Re-load files from cloud bucket that have already been stored

Please let me know if you see any issues with this setup.

Doing Data Science on Google Cloud