3

I'm trying to pip install a python package from a private github repo using a init.sh script that I uploaded to my s3 bucket. enter image description here

This is my init.sh file

#!/bin/bash
TOKEN={{secrets/private-repo/github}}
pip install git+https://${TOKEN}@github.com/<path-to-repo>

When I try to create my cluster I get the following error messsage. Init script failure: Cluster scoped init script s3://<s3_bucket>/init.sh failed: Script exit status is non-zero

I create a secret through the API with scope and key as private-repo and github. I tested this using a notebook and it worked fine.

Documentation Used: https://docs.databricks.com/security/secrets/secrets.html#reference-a-secret-in-an-environment-variable

torek
  • 448,244
  • 59
  • 642
  • 775
satoshi
  • 439
  • 3
  • 14
  • We'd need to see further debugging details to figure out what exactly failed. The lack of quoting around the strings is a possible problem; see [When to wrap quotes around a shell variable](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Oct 05 '22 at 07:50
  • @tripleee Sadly, that is the only error log generated. – satoshi Oct 05 '22 at 07:54
  • You can tweak up the output somewhat with `set -x` but ultimately you probably want to pass in some debugging options to `pip` – tripleee Oct 05 '22 at 07:55

1 Answers1

2

The problem is that you're trying to refer to the secret using the {{secrets/private-repo/github}} syntax, but it doesn't work from the inside of the init script.

You need to define an environment variable on the cluster level and use that secret syntax there, and then it will be available inside your init script. See documentation on that topic.

Add this line from your init script to the Cluster > Advanced options > Spark > Environment variables section.

TOKEN={{secrets/private-repo/github}}
Alex Ott
  • 80,552
  • 8
  • 87
  • 132