3

When multiple people are working on Python project with Git, what is the best way to ensure that the local Conda environments have identical package sets installed?

So far I use

conda env export > conda_env.yml

for recording the environment and

conda env update --file conda_env.yml  --prune

for synchronizing an environment.

To verify that the local environment matches the contents of conda_env.yml, I have the following test:

def test_conda_environment():
    """Compares output of ``conda env export`` with the contents of ``conda_env.yml``. """

    # execute 'conda env export' and parse yaml output:
    cmd_result = run(['conda', 'env', 'export'], capture_output=True)
    d0 = yaml.safe_load(cmd_result.stdout.decode('utf-8'))

    # read saved conda environment from yaml file:
    fn = Path('..') / 'conda_env.yml'
    with open(fn, 'rt') as fp:
        d1 = yaml.safe_load(fp)

    # Compare the two dictionaries:
    if d0['channels'] != d1['channels']:
        print(f"Conda channels differ (current vs '{fn}'): " +
              f"{d0['channels']} vs {d0['channels']}")

    s0, s1 = sorted(d0['dependencies']), sorted(d1['dependencies'])
    if s0 != s1:
        df = difflib.Differ()
        ds = df.compare(s0, s1)
        ts = f"Differences of current environment (+) and file '{fn}' (-):"
        print("\n" + ts)
        print('=' * len(ts))
        print('\n'.join(l_ for l_ in ds if not l_.startswith(' ')))
        assert False  # fail test

Is there a more straightforward way?

Dietrich
  • 5,241
  • 3
  • 24
  • 36
  • You already have a solution that sounds like it works. The only alternative I currently see is sharing the directory where conda installs its packages (probably a bad idea) or to specify all packages at the start and only modify them in ways that would not run with the wrong env so that the dev realizes they need to update the conda env – lucidbrot May 27 '20 at 10:11
  • There's [a great answer on git integration](https://stackoverflow.com/a/56787188/570918) and [a similar question on auto-updating an environment.yaml from git](https://stackoverflow.com/q/57838081/570918). This all assumes that users are working on identical platforms (e.g., **linux-64** or **osx-64**). – merv May 27 '20 at 20:31
  • @lucidbrot sharing a package cache (i.e., common `pkgs_dirs`) is a great idea if everyone is on the same server. However, directly sharing an environment (i.e., common `envs_dirs`) is a bad idea; those should be local to individuals, but if they can still be colocated to the same disk as the central `pkgs_dirs` then Conda can still leverage hardlinks (i.e., minimal duplication). – merv May 27 '20 at 20:37
  • 1
    Why check the packages using that function? Are you worried that conda won’t respect the environment.yml file when creating the environment? – AMC May 28 '20 at 05:00
  • @AMC The used packages are expected to change quite a bit over the course of the project, so the environments need to be updated from time to time. – Dietrich May 28 '20 at 09:00
  • 1
    @merv The links on git autohooks are quite helpful. I kind of hoped for an out-of-the-box solution from Anaconda, which also covers the pathological cases (e.g., switching environment with branches). – Dietrich May 28 '20 at 09:05
  • @Dietrich Ah, so it's possible that a user would forget to update the environment? You could also make an entirely new environment, no? – AMC May 28 '20 at 23:46
  • @AMC Yes, hunting down package version/dependency problems, if you don't expect them, can be quite tedious. Creating a new environment each time would be a robust solution. The downside might be, that switching branches will always take some time, since Conda is relatively slow. Having one environment per branch might circumvent that problem. – Dietrich May 29 '20 at 11:39
  • _since Conda is relatively slow_ How slow are we talking? _Having one environment per branch might circumvent that problem._ Please forgive me, I'm a bit lost. What would be the (slower) alternative? – AMC Jun 03 '20 at 02:15

0 Answers0