First I create a root (A) dataset. Then I created 2 dataset branches derived from root dataset.
How do I "merge" these 2 branches to form another dataset ?
Basically, the graph looks like an inverted diamond shape.
Disclaimer: I'm part of the ClearML team
To merge datasets, do this:
from clearml import Dataset
root = Dataset.create(dataset_name="root", dataset_project="some_project")
root.add_files("a.txt")
root.upload()
root.finalize()
child_1 = Dataset.create(dataset_name="child_1", dataset_project="some_project", parent_datasets=[root.id])
child_1.add_files("child_1.txt")
child_1.upload()
child_1.finalize()
child_2 = Dataset.create(dataset_name="child_2", dataset_project="some_project", parent_datasets=[root.id])
child_2.add_files("child_2.txt")
child_2.upload()
child_2.finalize()
merger = Dataset.create(dataset_name="merger", dataset_project="some_project", parent_datasets=[child_1.id, child_2.id])
# will print ['a.txt', 'child_1.txt', 'child_2.txt']
print(merger.list_files())
Erez from ClearML here :) To merge these datasets just specify their ID as parents and it should merge them!