1

I have a folder with about 10,000 images, I have a huge *.txt file as follows. The txt file has 30,000 lines. Each image has three lines, line (1) contains the image name for example "04406_8_074402.jpeg". Line (2) contains the image category, in this case, it is a cat, and --- to separate that information from the next line. It contains image file name/ path and image category:

for example:

Analysing Image: /path to image folder/images/04406_8_074402.jpeg
Object Class Presented: cat
----------------------------------
Analysing Image: /path to image folder/images/00009_8_071203.jpeg
Object Class Presented: dog
----------------------------------
Analysing Image: /path to image folder/images/04440_8_045244.jpeg
Object Class Presented: box
----------------------------------
Analysing Image: /path to image folder/images/00045_8_051505.jpeg
Object Class Presented: unclassified
.
.
.
.
.
----------------------------------
Analysing Image: /path to image folder/images/02290_8_073302.jpeg
Object Class Presented: panda 
----------------------------------

I need to categorize those images into different folders based on their class names.  I know I can read the txt file using:

with open('file.txt') as f:
    line = f.readline()
    while line:
        line = f.readline()
        print(line)

My question is how do I put those images into different folders based on their class names?  

  • Welcome to Stackoverflow. What you described is the python code for simply parsing the txt. The next thing would be to move (or copy) each image to a new directory. This directory will have folders and subfolders based on the class. However, before you do that, ensure which model/library you want to use, because some libraries are actually using this structure already for their dataloaders (for instance PyTorch). – GrigorisG Aug 03 '21 at 16:13
  • If you have 10,000 images and each one produces 3 lines of output, surely your text file should have 30,000 lines rather than 8,000? Or is there some possibility not mentioned? – Mark Setchell Aug 03 '21 at 16:55
  • Yes, it is 30,000, because each image has 3 lines. – programmer1234 Aug 03 '21 at 17:15

2 Answers2

2

I have compiled all the discussions here in this code. Please change the paths accordingly. In my code ..stack\main folder consisted all the images and I used the code below to read the ..stack\file.txt to move files based on class into ..\stack\main\cat and ..\stack\main\dog etc.

import os
import shutil
directory= r"C:\Users\VIDYA\Desktop\stack\main"

with open('file.txt') as f:
    line = f.readline()
    while line:
        line = f.readline()
        if line.startswith("Analysing Image:"):
            length = len(line)
            # Get image file name
            image=line[length -20 :-1]
            continue
        if line.startswith("Object Class Presented:"):
            word_list = line.split()  # list of words
            # get class name
            class_name=word_list[-1]
            new_folder=os.path.join(directory,class_name)
            os.makedirs(new_folder,exist_ok=True) #makes new class folder if it doesn't already exist
            source=os.path.join(directory,image)
            destination =os.path.join(directory,class_name)
            # Move the file from source to destination
            dest = shutil.move(source, destination)
Vidya Ganesh
  • 788
  • 11
  • 24
  • Thanks for the reply. I need to move images based on their class name(object class), actually, I do not need to write all images into a folder. What I need is to write cat images into the cat folder. – programmer1234 Aug 03 '21 at 16:40
  • your method writes all images to the destination folder regardless of their class. – programmer1234 Aug 03 '21 at 16:48
  • an if condition can be added to move them into a folder (either cat/dog) based on their class from the .txt – Vidya Ganesh Aug 03 '21 at 16:50
  • How many classes do you have? I see some are 'unclassified' as well. In that case we might have to automate the process of creating a new folder for unseen classes at the same time discarding unclassifieds – Vidya Ganesh Aug 03 '21 at 16:54
  • It would be appreciated if you could update the code, I attempted to run the code but did not work. – programmer1234 Aug 03 '21 at 16:55
  • I have 7 classes of objects including the unclassified category. I want to keep the unclassified category as well. I do not want to discard any of the classes, just want to create the folders and put each image into the relevant directory. I edited my post. – programmer1234 Aug 03 '21 at 17:23
  • Hey! I used your file.txt example and tested with a small set of images. It works for me and I have updated my answer accordingly. – Vidya Ganesh Aug 03 '21 at 20:01
  • Why do you make a list? Surely there's no need, read the filename and keep it, when the object class turns up make that directory and copy the file into it. Why do you need to know the object classes ahead of time, or worry how many there are? Just make the directories as they show up. – Mark Setchell Aug 03 '21 at 20:32
  • I was thinking its better to make sure we have the class names for all images by cross checking with lengths. But what you suggested makes more sense. Thank you for the suggestion @MarkSetchell. Also you are an inspiration :) – Vidya Ganesh Aug 03 '21 at 20:59
  • @Vidya Ganesh, Thank you for your response. I ran this and I was able only to extract one class out of 7 classes that I have. I was wondering if it is possible to extract all 7 classes in one run? Because the code now only extracts one class. – programmer1234 Aug 09 '21 at 16:25
  • I have updated the code to automate make directory for new classes so it should work irrespective of the number of classes you have . Previously we used if else conditions and had to copy that piece of code for different classes. Anyway I have tested this and it works for me , please let me know if you still face issues. – Vidya Ganesh Aug 09 '21 at 17:29
  • Thanks, @VidyaGanesh, I am not sure why I get this error:     source=os.path.join(directory,image) NameError: name 'image' is not definedc – programmer1234 Aug 09 '21 at 23:08
  • It's the same code that worked for you on a single class which should now work for all classes.. Make sure the paths and the file names are right.. And check the .txt file if it has "Analysing image" before "object class.." – Vidya Ganesh Aug 10 '21 at 06:00
  • And if image=line[length -20 :-1] – Vidya Ganesh Aug 10 '21 at 06:06
1

Along these lines in somewhat Pythonic pseudo-code:

while not done
    read line from file
    if line starts with "Analysing Image:"
        save image name
    else if line starts with "Object Class"
        save object class
        create directory named per object class suppressing error messages if it already exists
        move this image to that directory
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Thanks, would you mind posting working code instead of pseudo-code, because the image name and the class name are not in one line. I wanted to know how to write those – programmer1234 Aug 03 '21 at 17:54
  • Try writing the loop yourself and see if you can read the lines. Then try using `line.startswith()` like this https://stackoverflow.com/a/8802889/2836621 and just print the line prefixed with "filename" or "class". – Mark Setchell Aug 03 '21 at 18:01
  • I wanted to write the images based on the object name into different folders, I do not have a problem writing the entire images into a folder. – programmer1234 Aug 03 '21 at 18:09