8

Actually I want to install a library on my Azure databricks cluster but I cannot use the UI method because every time my cluster would change and in transition I cannot add library to it using UI. Is there any databricks utility command for doing this?

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
Samyak Jain
  • 155
  • 1
  • 2
  • 8
  • 1
    have you tried databricks [libraries CLI](https://docs.databricks.com/dev-tools/cli/libraries-cli.html) then install the library from DBFS. – jose praveen Mar 06 '20 at 04:43

2 Answers2

0

@CHEEKATLAPRADEEP-MSFT's answer is awesome! Just a complement:

If you want all your notebooks / clusters to have the same libs installed, you can take advantage of cluster-scoped or global (new feature) init scripts.

The example below retrieves packages from PyPi:

#!/bin/sh

# Install dependencies
pip install --upgrade boto3 psycopg2-binary requests simple-salesforce

You can even use a private package index - for example AWS CodeArtifact:

#Install AWS CLI
pip install --upgrade awscli

# Configure pip
aws codeartifact login --region <REGION> --tool pip --domain <DOMAIN> --domain-owner <AWS_ACCOUNT_ID> --repository <REPO>
pip config set global.extra-index-url https://pypi.org/simple

Note: the cluster instance profile must be allowed to get CodeArtifact credentials (arn:aws:iam::aws:policy/AWSCodeArtifactReadOnlyAccess).

Cheers

saza
  • 460
  • 6
  • 7
  • Follow up question. How do you configure the instance to get the AWS credentials? – pmanDS May 11 '22 at 06:50
  • @pmanDS We currently attach an "Instance profile" in the Advanced options of the cluster configuration page. – saza Jul 19 '22 at 01:14
  • 1
    @CHEEKATLAPRADEEP's answer? I don't see that here? I you referring to some other post or perhaps it was delete? – StatsStudent Dec 16 '22 at 08:33
  • that answer was removed by StackOverflow moderator and couldn't be restored until another moderator will chime in. – Alex Ott Feb 23 '23 at 15:10
0

You can use %pip install command to install the required libraries from within your notebook code. This documentation provides further detail on its usage: https://docs.databricks.com/libraries/notebooks-python-libraries.html. For example:

!pip install requests

For older runtimes there was dbutils.library utility (https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-library) but it was deprecated.