0

I'm pretty new at CNN and have I need to build a pipeline that loads the images and also get them ready for the CNN. The thing is that I need to build a dataset formed by images. There are three classes of images: COVID-19, Healthy lungs and Pneumonia. The files that I have are:

  • 1 folder containing images of lungs with covid-19
  • 1 folder containing images of healthy lungs
  • 1 folder containing images with pneumonia
  • 1 .txt file that has all the images for which the training dataset will be formed
  • 1 .txt file that has all the images for which the validation dataset will be formed
  • 1 .txt file that has all the images for which the text dataset will be formed

I´ve been searching on Internet but I don´t reach to find a way to build a dataset made by all the images but not even how to relate them to the .txt files and build the related training, test and validation dataset. Any suggestion? Please, find below the structure of the .txt file as an example:

2   PNEUMONIA/person888_bacteria_2812.jpeg
2   PNEUMONIA/person1209_bacteria_3161.jpeg
2   PNEUMONIA/person1718_bacteria_4540.jpeg
2   PNEUMONIA/person549_bacteria_2303.jpeg
2   PNEUMONIA/person831_bacteria_2742.jpeg
2   PNEUMONIA/person1571_bacteria_4108.jpeg
2   PNEUMONIA/person1310_bacteria_3300.jpeg
Panri93
  • 213
  • 1
  • 10
  • you can write your own custom data generator, but in case you don't need any special augmentations or something like this, you can just use Keras' `ImageDataGenerator` class. The method `flow_from_directory` is what you are searching for (looping over sub-directories, treats every sub-directory as a different class). [link_to_documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow_from_directory) – alivne May 23 '20 at 18:34
  • this looks nice to create a whole dataset of images! After doing this, I would need to create a dataset for training, validation and test containing the images specified in the .txt files. How can I read the dataset, link it to the .txt file and create a new one? – Panri93 May 23 '20 at 18:48
  • 3 options: (1) if you want to use this class, you can use "validation_split" argumunts to set the amount of data to be set as validation set. (2) however if you already chose the splitting yourself and want to use it, you can use the `flow_from_dataframe` method, but you need to create the data_frame yourself (3) save the test and train images at different locations yourself (and keep the sub-directories per label), and create different generator to each of the data roles. – alivne May 23 '20 at 18:54

1 Answers1

0

is necessary that you follow the txt files for making the train and validation sets?

if not, you could

make a train/ directory make a train/covid directory make a train/healthy directory make a train/pneumonia directory

trow everything in the respective dirs, and the move randomly a fraction of the total images reccount in them to their validation directory simils

otherwise you should read each txt and pick the specific file and move it to the target folder.

  • yes, it is necessary. Each folder must contain the specific images contained in the .txt files. How can I read the .txt file and move the images? – Panri93 May 24 '20 at 08:14
  • 1
    you can make a list and move the files by looping over it. this can be of help: https://stackoverflow.com/questions/3277503/how-to-read-a-file-line-by-line-into-a-list – Javier Espinoza May 25 '20 at 01:37