0

I have created a Google Dataproc cluster, but need to install presto as I now have a requirement. Presto is provided as an initialization action on Dataproc here, how can I run this initialization action after creation of the cluster.

Pramod Sripada
  • 241
  • 1
  • 5
  • 16

2 Answers2

1

You could use initialization-actions parameter

Ex:

gcloud dataproc clusters create $CLUSTERNAME \
    --project $PROJECT \
    --num-workers $WORKERS \
    --bucket $BUCKET \
    --master-machine-type $VMMASTER \
    --worker-machine-type $VMWORKER \
    --initialization-actions \
         gs://dataproc-initialization-actions/presto/presto.sh \
    --scopes cloud-platform

Maybe this script can help you: https://github.com/kanjih-ciandt/script-dataproc-datalab

hkanjih
  • 1,271
  • 1
  • 11
  • 29
  • I already have a cluster created, I cant tear it down, so I need to run the initialization action on an existing cluster. But your answer suggests creating new cluster – Pramod Sripada Oct 21 '17 at 01:07
  • Ok I got.. So you could access the master and works and execute presto.sh script. (Get script using gsutil cp gs://dataproc-initialization-actions/presto/presto.sh ). – hkanjih Oct 23 '17 at 17:04
1

Most init actions would probably run even after the cluster is created (though I haven't tried the Presto init action).

I like to run clusters describe to get the instance names, then run something like gcloud compute ssh <NODE> -- -T sudo bash -s < presto.sh for each node. Reference: How to use SSH to run a shell script on a remote machine?.

Notes:

  • Everything after the -- are args to the normal ssh command
  • The -T means don't try to create an interactive session (otherwise you'll get a warning like "Pseudo-terminal will not be allocated because stdin is not a terminal.")
  • I use "sudo bash" because init actions scripts assume they're being run as root.
  • presto.sh must be a copy of the script on your local machine. You could alternatively ssh and gsutil cp gs://dataproc-initialization-actions/presto/presto.sh . && sudo bash presto.sh.

But @Kanji Hara is correct in general. Spinning up a new cluster is pretty fast/painless, so we advocate using initialization actions when creating a cluster.

Karthik Palaniappan
  • 1,373
  • 8
  • 11