I am using Anaconda 3 on Windows 10 x64. I used the instructions here (see "Setting environment variables") to set environment variables specific to two particular conda environments, one of which is my base environment.
So far this seems to be working great; each environment is giving me the results I want by using the environment variables I want. Not only that, I seem to be able to use them both simultaneously in different applications without any "cross-contamination."
However, I am wondering whether I am creating any potential unintended consequences or learning bad habits by doing this. What might those unintended consequences or bad habits be, if any?
Specifics that may (not) matter: My specific application involves the fact that I want to use two python modules:
- (in my base environment) PySpark (the version that comes with Spark when I download it from the Apache website)
- (in another environment) Databricks-Connect (which installs to a folder called "PySpark" and is incompatible with PySpark)
In order to get #1 to work, I have not found any alternative except to set some environment variables, including minimally one called SPARK_HOME pointing to my Spark installation directory, and one called PYTHONPATH including the "Python" subdirectory of SPARK_HOME. I have tried some suggestions (which I'm happy to go into) in order to avoid this but without success (maybe findspark could help me? I tried a little with that). However, once I set those variables using the Windows interface (in particular, I think, the PYTHONPATH one), databricks-connect doesn't work in any environment. That makes sense to me since I'm guessing it is looking for PySpark in SPARK_HOME\Python?