Questions tagged [ray]

Ray is a library for writing parallel and distributed Python applications. It scales from your laptop to a large cluster, has a simple yet flexible API, and provides high performance out of the box.

At its core, Ray is a library for writing parallel and distributed Python applications. Its API provides a simple way to take arbitrary Python functions and classes and execute them in the distributed setting.

Learn more about Ray:

Ray also includes a number of powerful libraries:

  • Cluster Autoscaling: Automatically configure, launch, and manage clusters and experiments on AWS or GCP.
  • Hyperparameter Tuning: Automatically run experiments, tune hyperparameters, and visualize results with Ray Tune.
  • Reinforcement Learning: RLlib is a state-of-the-art platform for reinforcement learning research as well as reinforcement learning in practice.
  • Distributed Pandas: Modin provides a faster dataframe library with the same API as Pandas.
702 questions
184
votes
15 answers

TypeError: Descriptors cannot not be created directly

I tried to install Ray, but it gave an error: TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately…
hasu33
  • 1,891
  • 2
  • 4
  • 8
31
votes
2 answers

How exactly does Ray share data to workers?

There are many simple tutorials and also SO questions and answers out there which claim that Ray somehow shares data with the workers, but none of these go into the exact details of what gets shared how on which OS. For example in this SO answer:…
jpp1
  • 2,019
  • 3
  • 22
  • 43
18
votes
2 answers

How to fix the constantly growing memory usage of ray?

I started using ray for distributed machine learning and I already have some issues. The memory usage is simply growing until the program crashes. Altough I clear the list constantly, the memory is somehow leaking. Any idea why ? My specs: OS…
TRZUKLO
  • 193
  • 1
  • 5
18
votes
3 answers

Cannot install RAY

Ray library from RISE lab (https://rise.cs.berkeley.edu/blog/pandas-on-ray/) I am using Windows 10 Pro, 64-bit and running these scripts from Anaconda prompt. I have tried both pip install ray and pip3 install ray with the same…
cube
  • 345
  • 1
  • 2
  • 9
15
votes
3 answers

How can I use the python logging in Ray?

I use the logging module in the main function/process, it works well, but it seems can't work in Actor process/subprocess. How to make it work? In the sample below code, logging.info work in the main process but failed in the worker process.…
Han Zheng
  • 309
  • 2
  • 8
12
votes
3 answers

Change Logdir of Ray RLlib Training instead of ~/ray_results

I'm using Ray & RLlib to train RL agents on an Ubuntu system. Tensorboard is used to monitor the training progress by pointing it to ~/ray_results where all the log files for all runs are stored. Ray Tune is not being used. For example, on starting…
Nyxynyx
  • 61,411
  • 155
  • 482
  • 830
11
votes
1 answer

Is ray `num_cpus` used to actually allocate CPUs?

When using the ray framework, there is an option to select the number of CPUs required for this task, as explained here. Ex: @ray.remote(num_cpus=4) def f(): return 1 However this is unclear whether there is going to be actual CPU allocation:…
Phylliade
  • 1,667
  • 4
  • 19
  • 27
9
votes
0 answers

GPU memory is empty, but CUDA out of memory error occurs

During training this code with ray tune(1 gpu for 1 trial), after few hours of training (about 20 trials) CUDA out of memory error occurred from GPU:0,1. And even after terminated the training process, the GPUS still give out of memory error. As…
Kjyong
  • 195
  • 1
  • 8
8
votes
2 answers

Out of Memory with RAY Python Framework

I have created a simple remote function with ray that utilizes very little memory. However, after running for a short period of time the memory increases steadily and I get a RayOutOfMemoryError Exception. The following code is a VERY simple…
8
votes
3 answers

What is ray::IDLE and why are some of the workers running out of memory?

I'm running ray on EC2. I am running workers on c5.large instances, which have ~4G of RAM. When I run many jobs, I see these error messages: File "python/ray/_raylet.pyx", line 631, in ray._raylet.execute_task File…
Henry Henrinson
  • 5,203
  • 7
  • 44
  • 76
8
votes
1 answer

Is ray thread safe?

Assume that a ray actor is defined as below @ray.remote class Buffer: def __init__(self): self.memory = np.zeros(10) def modify_data(self, indices, values): self.memory[indices] = values def sample(self, size): …
Maybe
  • 2,129
  • 5
  • 25
  • 45
8
votes
2 answers

rllib use custom registered environments

Rllib docs provide some information about how to create and train a custom environment. There is some information about registering that environment, but I guess it needs to work differently than gym registration. I'm testing this out working with…
KindaTechy
  • 1,041
  • 9
  • 25
7
votes
2 answers

Not all Ray CLI dependencies were found

I have installed ray module and I get this warning all the time FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via pip install 'ray[default]'. Please update your…
Farhang Amaji
  • 742
  • 11
  • 24
7
votes
4 answers

Error while importing library "modin" in Python 3.6

import modin.pandas as pd I am importing modin.pandas library in my windows 10 machine but getting error "AttributeError: module 'ray' has no attribute 'utils'" Anything missed while installing modin library?
Learnings
  • 2,780
  • 9
  • 35
  • 55
7
votes
1 answer

Checkpoint best model for a trial in ray tune

So I just ran a tune experiment and got the following output: +--------------------+------------+-------+-------------+----------------+--------+------------+ | Trial name | status | loc | lr | weight_decay | loss | …
Kiran Sanjeevan
  • 149
  • 1
  • 7
1
2 3
46 47