2

I am running my DASK server on hpc where I have all basic necessary modules to run dask and I am loading that module in jupyter notebook. I would like to run some processing task using dask and the modules which are not available in the base environment of dask. For that I have my custom environment created using conda. Is there an easy way to link this new condo environment to the dask client before running my task.

I have tried using

from dask.distributed import Client,LocalCluster
client = Client(scheduler_file=schedule_json)
print(client)
client.upload_file('condaenvfile.tar')

also I have tried using client.run(os.system,'conda install -c conda-forge package -y') but still I am getting a message like module not found.


I am making my problem more clear so that I can figure out if there are any other alternatives to handle such issues.

import skimage
import dask.distributed import Client

client=Client(schedule_json)


def myfunc(param):
   process using skimage


r=[]
for in [list]:
     myres=dask.delayed(myfun)(param)
     r.append(myres)

allres=dask.compute(*r)

In the above example, I have dask module running on hpc environment which I don't have any control just I can load that module. I have my own condo environment inside my user profile I have to run some process using skilearn (and other modules) using the dask worker. What would be alternative to work around for such issue?

PUJA
  • 639
  • 1
  • 8
  • 18

1 Answers1

3

Once dask is running you can't switch out the underlying Python environment. Instead, you should build an environment with all the libraries and dependencies you need and run from the newly created env. To help with creating a environment I would recommend using conda-pack. If you want to modify an existing an environment you can do this but I would not recommend it. If you care deeply about this issue you might be interested in https://github.com/dask/distributed/issues/3111

quasiben
  • 1,444
  • 1
  • 11
  • 19
  • Thanks @quasiben for your suggestion. Now my question where is the use of client.run () or client.upload_file () isn't this mean to use for such purpose.. I am bit confused. Also I tried to explain my problem more explicitly if there would be some other alternatives. – PUJA Jun 23 '20 at 08:52
  • 1
    `client.run` can be useful for all sorts of things -- any time you want to run something once. `client.upload_file` is used for adding a library to an existing dask process but it is not robust . Still, nothing here will help swap out an entire environment only modify an existing one – quasiben Jun 23 '20 at 13:46
  • Thanks for the clarification. So using client.upload_file() adds library temporarily only during the dask process, do we need to have write access to environment where dask is running ? – PUJA Jun 23 '20 at 13:58
  • I believe it uploads to a temporary directory, not the env from where dask is running. I'd recommend reading through: https://distributed.readthedocs.io/en/latest/api.html#distributed.executor.Executor.upload_file and https://stackoverflow.com/questions/39295200/can-i-use-functions-imported-from-py-files-in-dask-distributed – quasiben Jun 23 '20 at 14:09