Switch the channel order and use Mamba. Specifically, I note that pyspark=3.3.1
is only available from Conda Forge, so the conda-forge
channel should go first to avoid any channel_priority: strict
masking issues. Mamba is faster, gives clearer error reporting, and the maintainers are very responsive.
test_syn_spark_3_3_1.yaml
name: test_syn_spark_3_3_1
channels:
- conda-forge
- defaults
# rest the same...
Create with Mamba (or micromamba):
## install mamba if needed
## conda install -n base -c conda-forge mamba
mamba env create -n test_syn_spark_3_3_1 -f test_syn_spark_3_3_1.yaml
This runs in a few minutes on my machine, which is mostly downloading time.
Other Thoughts
- I wouldn't ever impose a fixed constraint on
pip
or setuptools
unless there is a specific bug you are avoiding. I'd probably at least loosen to use lower bounds.
- Conda Forge is fully self-sufficient these days - I would not only drop
defaults
, but even insulate against any channel mixing with nodefaults
directive.
- I notice
defaults
channel prefers MKL for BLAS on x64 whereas Conda Forge defaults to OpenBLAS. So, you may want to explicitly declare your preference (e.g., accelerate
on macOS arm64, mkl
on Intel).
In summary, this is how I would write the YAML:
name: test_syn_spark_3_3_1
channels:
- conda-forge
- nodefaults # insulate from user config
dependencies:
## Python Core
- python=3.10
- pip >=23.0
- setuptool >=65.0
## BLAS
## adjust for hardware/preference
- blas=*=mkl
## Conda Python pkgs
- pandas=1.5
- pyarrow=11.0.0
- pyspark=3.3.1
## PyPI pkgs
- pip:
- azure-common==1.1.28
- azure-core==1.26.1
- azure-datalake-store==0.0.51
- azure-identity==1.7.0
- azure-mgmt-core==1.3.2
- azure-mgmt-resource==21.2.1
- azure-mgmt-storage==20.1.0
- azure-storage-blob==12.16.0
- azure-mgmt-authorization==2.0.0
- azure-mgmt-keyvault==10.1.0
- azure-storage-file-datalake==12.11.0
- check-wheel-contents==0.4.0
- pyarrowfs-adlgen2==0.2.4
- wheel-filename==1.4.1