I am trying to use our Azure Databricks clusters in Visual Studio running on a virtual machine. I am following the steps described here
1. Setup cluster
I set up a cluster with runtime 9.1 and specify the advanced options as should. The port I set to 8787.
The corresponding Python version for this cluster is 3.8.10:
2. Create a conda environment on my virtual machine with the same python version:
3. Activate conda environment:
4. Install databricks-connect where version is the same as the runtime of the newly created cluster
5. Set the configuration for databricks-connect
Here I specify Host, Token, Cluster ID, Organisation ID and make sure that the port is also 8787. The other parameters I cannot copy due to privacy issues, but these should all be fine.
6. Select the right conda environment in Visual Studio
So far everything works like a charm. However, when I now try to create a spark session in visual studio, it gets stuck. It won't execute the last line, but doesn't give an error either.
When I run databricks-connect test
I get following output. I have tried specifying the PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON environment variables, but this always results in the error message when running databricks-connect test
changing to 'cannot find path specified' path when it tests the python command.
I have also tried adding this the code in visual studio as suggested here:
import os
import sys
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
However, nothing helped so far. Is anyone familiar with this sort of error and can help me?