I have created a Google Dataproc cluster, but need to install presto as I now have a requirement. Presto is provided as an initialization action on Dataproc here, how can I run this initialization action after creation of the cluster.
Asked
Active
Viewed 1,413 times
2 Answers
1
You could use initialization-actions parameter
Ex:
gcloud dataproc clusters create $CLUSTERNAME \
--project $PROJECT \
--num-workers $WORKERS \
--bucket $BUCKET \
--master-machine-type $VMMASTER \
--worker-machine-type $VMWORKER \
--initialization-actions \
gs://dataproc-initialization-actions/presto/presto.sh \
--scopes cloud-platform
Maybe this script can help you: https://github.com/kanjih-ciandt/script-dataproc-datalab

hkanjih
- 1,271
- 1
- 11
- 29
-
I already have a cluster created, I cant tear it down, so I need to run the initialization action on an existing cluster. But your answer suggests creating new cluster – Pramod Sripada Oct 21 '17 at 01:07
-
Ok I got.. So you could access the master and works and execute presto.sh script. (Get script using gsutil cp gs://dataproc-initialization-actions/presto/presto.sh
). – hkanjih Oct 23 '17 at 17:04
1
Most init actions would probably run even after the cluster is created (though I haven't tried the Presto init action).
I like to run clusters describe
to get the instance names, then run something like gcloud compute ssh <NODE> -- -T sudo bash -s < presto.sh
for each node. Reference: How to use SSH to run a shell script on a remote machine?.
Notes:
- Everything after the
--
are args to the normal ssh command - The
-T
means don't try to create an interactive session (otherwise you'll get a warning like "Pseudo-terminal will not be allocated because stdin is not a terminal.") - I use "sudo bash" because init actions scripts assume they're being run as root.
- presto.sh must be a copy of the script on your local machine. You could alternatively ssh and
gsutil cp gs://dataproc-initialization-actions/presto/presto.sh . && sudo bash presto.sh
.
But @Kanji Hara is correct in general. Spinning up a new cluster is pretty fast/painless, so we advocate using initialization actions when creating a cluster.

Karthik Palaniappan
- 1,373
- 8
- 11