1

First I create a root (A) dataset. Then I created 2 dataset branches derived from root dataset.

How do I "merge" these 2 branches to form another dataset ?

Basically, the graph looks like an inverted diamond shape.

koayst
  • 2,095
  • 3
  • 17
  • 16

2 Answers2

1

Disclaimer: I'm part of the ClearML team

To merge datasets, do this:

from clearml import Dataset

root = Dataset.create(dataset_name="root", dataset_project="some_project")
root.add_files("a.txt")
root.upload()
root.finalize()

child_1 = Dataset.create(dataset_name="child_1", dataset_project="some_project", parent_datasets=[root.id])
child_1.add_files("child_1.txt")
child_1.upload()
child_1.finalize()

child_2 = Dataset.create(dataset_name="child_2", dataset_project="some_project", parent_datasets=[root.id])
child_2.add_files("child_2.txt")
child_2.upload()
child_2.finalize()


merger = Dataset.create(dataset_name="merger", dataset_project="some_project", parent_datasets=[child_1.id, child_2.id])
# will print ['a.txt', 'child_1.txt', 'child_2.txt']
print(merger.list_files())
Martin.B
  • 599
  • 3
  • 9
0

Erez from ClearML here :) To merge these datasets just specify their ID as parents and it should merge them!

Erez
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 23 '22 at 07:09