1

I have a big python project with many folders

-model
-utils
-compute

My ray remote code is some function in compute folder and I need to run in remote task code from model and util

Currently, I'm getting errors no such module for different project folders

from utils.osops import run_command
from model.model_desc import ModelInsance
from compute.ray_remote import 

@ray.remote
def run_eval_remote(cmd_data, model_json):
    model_ins = ModelInsance.read_from_json(model_json)
    run_command(model_ins.bash_cmd) 
    # do some more staff
    return some_value 

How to do it properly?

This is a stack trace:

  "/Users/me/proj/compute/evaluator_ray.py", line 178, in <listcomp>
ray_res = [self.eval_instance(instance, eval_metric) for instance in mutations_for_search]
  File "/Users/me/proj/compute/evaluator_ray.py", line 175, in eval_instance
return run_eval_remote.remote(cmd_data, instance_json)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/remote_function.py", line 114, in _remote_proxy
return self._remote(args=args, kwargs=kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 292, in _invocation_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/remote_function.py", line 202, in _remote
return client_mode_convert_function(
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 133, in client_mode_convert_function
return client_func._remote(in_args, in_kwargs, **kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 98, in _remote
return self.options(**option_args).remote(*args, **kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 296, in remote
 return return_refs(ray.call_remote(self, *args, **kwargs))
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/api.py", line 103, in call_remote
return self.worker.call_remote(instance, *args, **kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 322, in call_remote
task = instance._prepare_client_task()
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 302, in _prepare_client_task
task = self.remote_stub._prepare_client_task()
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 119, in _prepare_client_task
self._ensure_ref()
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 115, in _ensure_ref
self._ref = ray.put(
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/api.py", line 52, in put
return self.worker.put(*args, **kwargs)
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 260, in put
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 260, in <listcomp>
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
  File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 280, in _put
raise cloudpickle.loads(resp.error)
ModuleNotFoundError: No module named 'compute'
Julias
  • 5,752
  • 17
  • 59
  • 84
  • What's the full stack trace? Have you tried adding the root directory to the python lib using the sys library? https://stackoverflow.com/a/16114586/10475762 – jhso Oct 10 '21 at 13:17
  • 1
    Actually, I didn't. I'm looking for a documentation / guide on how to distribute my code when there are additional, not external lib dependencies – Julias Oct 10 '21 at 13:52

1 Answers1

1

I ran into a similar problem, and solved it as follows:

  • distribute code to each node (I just git cloned in each node)
  • make sure the version / branch / etc. of the code is same in each node
  • set up a virtual env in each node, install ray (and other project dependencies) in it
  • start ray from the virtualenv, and join the cluster

Now, when you kick off the job from the head node (or external to the cluster), the dependencies are present and the job runs fine.

A cleaner way to distribute is via containers, of course, but for my purposes, this approach worked just fine.

Deven
  • 156
  • 1
  • 1
  • 13