Questions tagged [trains]

Questions regarding TRAINS pypi package & server (auto-magical experiment manager & version control for AI)

TRAINS is an Auto-Magical Experiment Manager & Version Control for AI. TRAINS tracks and controls ML/DL processes by associating: code version, research projects, performance metrics, and model provenance.

23 questions
6
votes
1 answer

Can ClearML (formerly Trains) work a local server?

I am trying to start my way with ClearML (formerly known as Trains). I see on the documentation that I need to have server running, either on the ClearML platform itself, or on a remote machine using AWS etc. I would really like to bypass this…
DalyaG
  • 2,979
  • 2
  • 16
  • 19
3
votes
1 answer

How to manage datasets in ClearML Web UI?

Using a self-deployed ClearML server with the clearml-data CLI, I would like to manage (or view) my datasets in the WebUI as shown on the ClearML webpage (https://clear.ml/mlops/clearml-feature-store/): However, this feature does not show up in my…
kleka
  • 364
  • 3
  • 14
3
votes
1 answer

ClearML get max value from logged values

I use ClearML to track my tensorboard logs (from PyTorch Lightning) during training. At a point later I start another script which connects to existing task and do some testing. But unfortenautly I do not have all information in the second script,…
3
votes
1 answer

ClearML server IP address not used with localhost and SSH port forwarding

Trying to use clearml-server on own Ubuntu 18.04.5. I use env variables to set the IP Address of my clearml-server. export CLEARML_HOST_IP=127.0.0.1 export TRAINS_HOST_IP=127.0.0.1 But it still is available thorugh the external server IP. How can I…
2
votes
1 answer

How to fix trainserver empty server?

Im trying to install an allegroai trains-server on a k8s cluster. I tried the following 3 methods bare linux installtion k8s manifest installation helm installation I followed the linux installation to the letter, and in the k8s installations used…
Dean Light
  • 21
  • 1
2
votes
1 answer

How should Trains be used with hyper-param optimization tools like RayTune?

What could be a reasonable setup for this? Can I call Task.init() multiple times in the same execution?
Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
2
votes
1 answer

Trains: Can I reset the status of a task? (from 'Aborted' back to 'Running')

I had to stop training in the middle, which set the Trains status to Aborted. Later I continued it from the last checkpoint, but the status remained Aborted. Furthermore, automatic training metrics stopped appearing in the dashboard (though custom…
Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
2
votes
2 answers

Tracking separate train/test processes with Trains

In my setup, I run a script that trains a model and starts generating checkpoints. Another script watches for new checkpoints and evaluates them. The scripts run in parallel, so evaluation is just a step behind training. What's the right Tracks…
Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
2
votes
1 answer

Parallel Coordinates Plot in TRAINS

Is there a way to create a parallel coordinates plot in TRAINS (https://github.com/allegroai/trains) package to compare several hyper-parameters in respect to a specific metric?
Majd
  • 21
  • 4
1
vote
1 answer

ClearML Web UI custom column not persistent

I'm using the experiments page of a project in ClearML Web UI to visualize some custom metrics. Therefore I've customized my table vie (https://allegro.ai/clearml/docs/docs/webapp/webapp_exp_table.html?highlight=customize#adding-metrics) But…
1
vote
1 answer

ClearML multiple tasks in single script changes logged value names

I trained multiple models with different configuration for a custom hyperparameter search. I use pytorch_lightning and its logging (TensorboardLogger). When running my training script after Task.init() ClearML auto-creates a Task and connects the…
1
vote
1 answer

ClearML SSH port forwarding fileserver not available in WEB Ui

Trying to use clearml-server on own Ubuntu 18.04.5 with SSH Port Forwarding and not beeing able to see my debug samples. My setup: ClearML server on hostA SSH Tunnel connections to access Web App from working machine via localhost:18080 Web App:…
1
vote
1 answer

Trains: reusing previous task id

I am using reuse_last_task_id=True to overwrite an existing task (with same project and task name). But the experiment contains the torch model and therefore does not overwrite the existing task but creates a new one. How can I detach the model from…
kyc12
  • 349
  • 2
  • 15
1
vote
3 answers

pip install trains fails

upon running pip install trains in my virtual env I am getting ERROR: Command errored out with exit status 1: command: /home/epdadmin/noam/code/venv_linux/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] =…
Gulzar
  • 23,452
  • 27
  • 113
  • 201
1
vote
1 answer

Will Trains automagically log Tensorboard HParams?

I know that it's possible to send hyper-params as a dictionary to Trains. But can it also automagically log hyper-params that are logged using the TF2 HParams module? Edit: This is done in the HParams tutorial using hp.hparams(hparams).
Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
1
2