2

There is a Py-pi for pyRFC, but like all other C-python libraries, it has a lot of dependencies, and requires the setting of environment variables, etc.

Is it possible to install a c-python library like pyRFC onto a Databricks cluster? If so, how would you have to go about including the SDK dependencies?

Perhaps, someone has tried with the Java version already?

Suncatcher
  • 10,355
  • 10
  • 52
  • 90

1 Answers1

2

Yes, it's possible. It's usually done by attaching a cluster init script to a cluster. The task of the cluster init script is to setup all necessary dependencies, compile libraries/install packages, etc. on all cluster nodes. Usually, people are downloading their packages, etc. and put them on DBFS, and then accessing them from inside the init script using the /dbfs mount.

Script could look like this (just example):

#!/bin/bash

# Unpack SAP SDK into some location
tar zxvf /dbfs/FileStore/SAP-SDK.tar.gz

# install package
pip install pyrfc
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thank you, worked a charm. Also ended up working with the whl package as well and env variables set to the dbfs path. – Wonseok Choi Mar 15 '23 at 14:45