I just followed the code here (with minor modifications for sklearn 0.17). In that example, data are just lists or numpy arrays. Now I want to prepare a toy training dataset on the disk, and use datasets.load_files
to load it for multilabel classification. However, simply following the load_files
convention, and then copying the same file into multiple folders, doesn't produce a list of lists (aka. label sets) for dataset.target
.
So what is the correct way to prepare a dataset for multilabel classification?