1

I am using Anaconda 3 on Windows 10 x64. I used the instructions here (see "Setting environment variables") to set environment variables specific to two particular conda environments, one of which is my base environment.

So far this seems to be working great; each environment is giving me the results I want by using the environment variables I want. Not only that, I seem to be able to use them both simultaneously in different applications without any "cross-contamination."

However, I am wondering whether I am creating any potential unintended consequences or learning bad habits by doing this. What might those unintended consequences or bad habits be, if any?

Specifics that may (not) matter: My specific application involves the fact that I want to use two python modules:

  1. (in my base environment) PySpark (the version that comes with Spark when I download it from the Apache website)
  2. (in another environment) Databricks-Connect (which installs to a folder called "PySpark" and is incompatible with PySpark)

In order to get #1 to work, I have not found any alternative except to set some environment variables, including minimally one called SPARK_HOME pointing to my Spark installation directory, and one called PYTHONPATH including the "Python" subdirectory of SPARK_HOME. I have tried some suggestions (which I'm happy to go into) in order to avoid this but without success (maybe findspark could help me? I tried a little with that). However, once I set those variables using the Windows interface (in particular, I think, the PYTHONPATH one), databricks-connect doesn't work in any environment. That makes sense to me since I'm guessing it is looking for PySpark in SPARK_HOME\Python?

BeeePollen
  • 423
  • 1
  • 4
  • 10

1 Answers1

0

Editing %PYTHONPATH% at least sounds bad.

I just broke one of my conda environments by setting a %PYTHONPATH% variable (lesson learned: don't do that with conda) via conda env config vars set PYTHONPATH=<path>, which prevented me from switching the conda environment back to base except by restarting my shell entirely, because all conda commands had broken down.

To rectify, I had to temporarily unset %PYTHONPATH% with OS tools; That way basic conda commands started working again. However, the environment broke again every time I reactivated the it, and removing the problematic variable with conda env config vars unset didn't work, because the variable was not actually active after I had temporarily unset it.

So to actually make the fix permanent I had to remove the problematic variable from file %CONDA_PREFIX%/conda-meta/state, which seems to be the actual config file that stores the variables set with conda env config. (I actually just removed the file entirely, because I had nothing else set in there.)

To actually append something to your module path in Anaconda 3, don't use the PYTHONPATH environment variable, but create a file with .pth suffix under %CONDA_PREFIX%/Lib/site-packages and paste in that file the paths (one per line, no quotes needed) that you want to automatically append to sys.path. That way you won't accidentially break your conda environment.