Concerning products within Google Cloud Platform you can create a Datalab instance to run your notebooks and specify the desired machine type with the --machine-type
flag (docs). You can use a high-memory machine if needed.
Of course, you can also use Dataproc as you already proposed. For easier setup you can use the predefined initialization action with the following parameter upon cluster creation:
--initialization-actions gs://dataproc-initialization-actions/datalab/datalab.sh
Edit
As you are using a GCE instance, you can also use a script to autoshutdown the VM when you are not using it. You can edit ~/.bash_logout
so that it checks if it's the last session and, if so, stops the VM
if [ $(who|wc -l) == 1 ];
then
gcloud compute instances stop $(hostname) --zone $(curl -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/zone 2>\dev\null | cut -d/ -f4) --quiet
fi
Or, if you prefer a curl
approach:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" https://www.googleapis.com/compute/v1/projects/$(gcloud config get-value project 2>\dev\null)/zones/$(curl -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/zone 2>\dev\null | cut -d/ -f4)/instances/$(hostname)/stop -d ""
Keep in mind that you might need to update Cloud SDK components to get the gcloud
command to work. Either use:
gcloud components update
or
sudo apt-get update && sudo apt-get --only-upgrade install kubectl google-cloud-sdk google-cloud-sdk-datastore-emulator google-cloud-sdk-pubsub-emulator google-cloud-sdk-app-engine-go google-cloud-sdk-app-engine-java google-cloud-sdk-app-engine-python google-cloud-sdk-cbt google-cloud-sdk-bigtable-emulator google-cloud-sdk-datalab -y
You can include one of these and ~/.bash_logout
edits in your startup-script.