Questions tagged [ray-tune]

72 questions
12
votes
3 answers

Change Logdir of Ray RLlib Training instead of ~/ray_results

I'm using Ray & RLlib to train RL agents on an Ubuntu system. Tensorboard is used to monitor the training progress by pointing it to ~/ray_results where all the log files for all runs are stored. Ray Tune is not being used. For example, on starting…
Nyxynyx
  • 61,411
  • 155
  • 482
  • 830
9
votes
3 answers

Raytune is throwing error: "module 'pickle' has no attribute 'PickleBuffer'" when attempting hyperparameter search

I am more or less following this example to integrate the ray tune hyperparameter library with the huggingface transformers library using my own dataset. Here is my script: import ray from ray import tune from ray.tune import CLIReporter from…
Luca Guarro
  • 1,085
  • 1
  • 11
  • 25
7
votes
1 answer

Checkpoint best model for a trial in ray tune

So I just ran a tune experiment and got the following output: +--------------------+------------+-------+-------------+----------------+--------+------------+ | Trial name | status | loc | lr | weight_decay | loss | …
Kiran Sanjeevan
  • 149
  • 1
  • 7
4
votes
1 answer

Ray[tune] for pytorch TypeError: ray.cloudpickle.dumps

I am having trouble getting started with tune from Ray. I have a PyTorch model to be trained and I am trying to fine-tune using this library. I am very new to Raytune so please bear with me and help me understand where the error stems from. my…
CtrlMj
  • 119
  • 7
4
votes
1 answer

Out of memory at every second trial using Ray Tune

I am tuning the hyperparameters using ray tune. The model is built in the tensorflow library, it occupies a large part of the available GPU memory. I noticed that every second call reports an out of memory error.It looks like the memory is being…
3
votes
1 answer

How do I checkpoint only the best model from a ray tune run?

NOTE: To some extent, this was already asked here but my question tackles a different aspect of getting the best checkpoint. In the referenced question, the author only desired to retrieve the best checkpoint from a set of checkpoints after the ray…
c0mr4t
  • 311
  • 2
  • 17
3
votes
1 answer

How to define SearchAlgorithm-agnostic, high-dimensional search space in Ray Tune?

I have two questions concerning Ray Tune. First, how can I define a hyperparameter search space independently from the particular SearchAlgorithm used. For instance, HyperOpt uses something like 'height': hp.uniform('height', -100, 100) whereas…
Rylan Schaeffer
  • 1,945
  • 2
  • 28
  • 50
2
votes
0 answers

Raytune tune.choice Typeerror: int() argument must be a string, a bytes-like object or a number, not 'Categorical'

I am trying hyperparameter tuning using Ray-tune. current my tune_config is shown in below code self.tune_config = { "batch_size": tune.choice([128, 256, 512]), "epoch": tune.choice([50, 100, 200]), "sequence_length": tune.choice([128,…
hjsg1010
  • 165
  • 3
  • 13
2
votes
0 answers

ValueError: The actor ImplicitFunc is too large (106 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB)

While I used the ray tune toolbox to find the optimal hyperparameters I encountered the following error: ValueError: The actor ImplicitFunc is too large (106 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly…
Echolst 1
  • 21
  • 1
2
votes
0 answers

The actor died unexpectedly before finishing this task ( Ray1.7.0 , Sagemaker )

I am running Ray rllib on sagemaker with 8 cores CPU using the sagemaker_rl library, I set num_workers to 7. After a long execution I face The actor died unexpectedly before finishing this task class MyLauncher(SageMakerRayLauncher): def…
2
votes
0 answers

Nested hyperparameters in Ray Tune?

I am using Ray Tune and I am disappointed by the lack of options for conditional / nested hyperparameters. It seems I will have to hack something together, but since I can't be the first one who had this problem I'm wondering how other people solved…
Florian Dietz
  • 877
  • 9
  • 20
2
votes
0 answers

How get optimal number of iterations in ray tune

If I'm using ray tune without a scheduler, how can I determine the number of iterations after which the network starts to overfit? I.e. I need an iteration, when the model achieved the best score on a validation set.
2
votes
1 answer

When using ray tune, value defined in config returns a non-float value

I'm new to use Ray Tune. I defined my ray config as below: ray_config = { "estimator/dropout_rate": tune.uniform(0.0, 0.3), "estimator/d_model": tune.choice([64]), "estimator/num_encoder_layers": tune.choice([3]), …
Ashikandi
  • 47
  • 5
2
votes
1 answer

What does 'output_dir' mean in transformers.TrainingArguments?

On the huggingface site documentation, it says 'The output directory where the model predictions and checkpoints will be written'. I don't quite understand what it means. Do I have to create any file for that?
2
votes
0 answers

Insufficient cluster resources to launch trial - has only 0 GPUs

I am following this tutorial (which is basically this) in order to use ray tune for hyperparemeter optimization. My model is training fine on the GPU without the optimization but now I want to optimize. I applied the tutorial to my code but when I…
m02ph3u5
  • 3,022
  • 7
  • 38
  • 51
1
2 3 4 5