1

I am trying to use our Azure Databricks clusters in Visual Studio running on a virtual machine. I am following the steps described here

1. Setup cluster

I set up a cluster with runtime 9.1 and specify the advanced options as should. The port I set to 8787.

enter image description here

The corresponding Python version for this cluster is 3.8.10:

enter image description here

2. Create a conda environment on my virtual machine with the same python version:

enter image description here

3. Activate conda environment:

enter image description here

4. Install databricks-connect where version is the same as the runtime of the newly created cluster

enter image description here

5. Set the configuration for databricks-connect

Here I specify Host, Token, Cluster ID, Organisation ID and make sure that the port is also 8787. The other parameters I cannot copy due to privacy issues, but these should all be fine.

enter image description here

6. Select the right conda environment in Visual Studio

enter image description here

So far everything works like a charm. However, when I now try to create a spark session in visual studio, it gets stuck. It won't execute the last line, but doesn't give an error either.

enter image description here

When I run databricks-connect test I get following output. I have tried specifying the PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON environment variables, but this always results in the error message when running databricks-connect test changing to 'cannot find path specified' path when it tests the python command.

enter image description here

I have also tried adding this the code in visual studio as suggested here:

import os
import sys
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable

However, nothing helped so far. Is anyone familiar with this sort of error and can help me?

user3387899
  • 601
  • 5
  • 18

0 Answers0