2

I want to personally train the controlnet, but I find it inconvenient to prepare the datasets. As I follow the huggingface tutorial available at this link: https://huggingface.co/blog/train-your-controlnet, I believe I should organize the dataset in the huggingface datasets format. My intention is to train the controlnet using various prompt settings and compare the outcomes. However, I realize that I will need to create multiple datasets for each experiment, which is time-consuming and space-inefficient because the images and conditional images in each dataset remain the same.

Should I create multiple datasets that differ solely in the prompt column, or is there a more efficient approach to accomplish this?

Yun
  • 21
  • 2

1 Answers1

0

Create multiple datasets that have only the prompt column ( e.g. controlnet_prompts_1, controlnet_prompts_2, etc. ) and one single dataset that has the images, conditional images and all other columns except for the prompt column ( e.g. controlnet_features ).

Then, whenever you want to use a particular combination of a prompt dataset with the main features dataset, use concatenate_datasets to concatenate both datasets, like this ( Documentation ):

from datasets import concatenate_datasets
username = "" # add your username here

for i in range(10): # assuming you have 10 distinct prompt settings
    controlnet_prompts = load_dataset(f"{username}/controlnet_prompts_{i+1}")
    controlnet_features = load_dataset(f"{username}/controlnet_features")
    dataset = concatenate_datasets([controlnet_prompts, controlnet_features], axis=1)
    # perform your experiments with the dataset
Ruan
  • 772
  • 4
  • 13
  • what is the minimum size of a data, let say, that can train well the model? (tens, hundereds, thousands? ) – logame Aug 28 '23 at 07:30