1

I want to loop over a batch of files in order to get 32 images of each sub-directory at a time (I cant load all images due to memory) e.g load img 1-32 of every dir use them and then load img 33-64 then 65-96 etc

My directory:

Rootdir
  - dir1
    - img 1
    - img 2
    - img...
  - dir2
    - img 5000001
    - img 5000002
    - img...
  - dir3
    - img 10000001
    - img 10000002
    - img...

So I would need to load img1,2,..,32, 5000001,...5000032, 1000001,...10000032 at first loop then img33,34,..,64, 5000033,...5000064, 1000033,...10000064 at second loop

Is there a way to do this properly?

I am trying using os.walk and it allows me to loop over my directory but I don't see how I can adapt this loop to my required 32 batches?

for dirName, subdirList, fileList in os.walk(rootdir):
      print('Found directory: %s' % dirName)
      for fname in sorted(fileList):
        img_path = os.path.join(dirName, fname)
        try:
          img = load_img(img_path, target_size=None)
          imgs.append(img)
        except Exception as e:
          print(str(e), fname, i)
      #do something on imgs

EDIT

all of your comment get me stuff like that:

dir1/img1.jpg to dir1/img32.jpg then dir1/img33.jpg to dir1/img64.jpg then ...

then dir2/img1.jpg to dir1/img32.jpg then dir2/img33.jpg to dir2/img64.jpg then ...

then dir3/img1.jpg to dir3/img32.jpg then dir3/img33.jpg to dir3/img64.jpg :(

What I'm trying to achieve is:

Files of dir1 numero 1 to 32 + files of dir2 numero 1 to 32 + files of dir3 numero 1 to 32 then

Files of dir1 numero 33 to 64 + files of dir2 numero 33 to 64 + files of dir3 numero 33 to 64 in the same loop

Hadrien Berthier
  • 305
  • 1
  • 3
  • 17

4 Answers4

3

os.walk already returns a generator which will yield a 3-tuple (dirpath, dirnames, filenames) values on fly, so you just need to yield the slice of the filenames array in batches of 32.


This is an example:

import os

# Your root directory path
rootdir = r"Root"

#Your batch size
batch_size = 32

def walk_dirs(directory, batch_size):
    walk_dirs_generator = os.walk(directory)
    for dirname, subdirectories, filenames in walk_dirs_generator:
        for i in range(0, len(filenames), batch_size):
            # slice the filenames list 0-31, 32-64 and so on
            yield [os.path.join(dirname, filename) for filename in filenames[i:i+batch_size]]

# Finally iterate over the walk_dirs function which itself returns a generator
for file_name_batch in walk_dirs(rootdir, batch_size):
    for file_name in file_name_batch:
        # Do some processing on the batch now
        print (file_name)
        pass
Kunal Mukherjee
  • 5,775
  • 3
  • 25
  • 53
  • again this give me batch of 32 files of one directory one by one. for exemple I have 3 directory kitchen bathroom and bedroom, using your code I get kitchen/img1.jpg to kitchen/img32.jpg and then I get kitchen/img33.jpg to kitchen/img64.jpg, I don't want that I want kitchen/img1.jpg to kitchen/img32.jpg + bathroom/img1.jpg to bathroom/img32.jpg + bedroom/img1.jpg to bedroom/img32.jpg in the same loop – Hadrien Berthier Mar 04 '19 at 13:22
  • I don't know if what I'm trying to achieve is clear but I need the first files of each directory yield at one time then the second batch of each dir etc – Hadrien Berthier Mar 04 '19 at 13:35
  • @HadrienBerthier yeah got the requirement, let me check – Kunal Mukherjee Mar 04 '19 at 13:36
0

You could take a look at os.walk()

EDIT: simple counter example

counter = 0
for x in mylist:
    # do something with x 
    todo_list.append(x)
    counter += 1
    if counter % 32 == 0: 
        # do something with todo list
        todo_list = [] # empty todo list for next batch
Georges Lorré
  • 443
  • 3
  • 11
0

What about always using the same img list and process it as soon as you have 32 images?

for dirName, subdirList, fileList in os.walk('c:\\Java\\'):
      print('Found directory: %s' % dirName)
      for fname in sorted(fileList):
        img_path = os.path.join(dirName, fname)
        try:
          img = load_img(img_path, target_size=None)
          imgs.append(img)
          if len(imgs) == 32:
            print("Doing what I have to with current imgs list (add your function here)")
            img = [] # cleaning img list
        except Exception as e:
          print(str(e))
      #do something on imgs

if you need to keep track of all the previous lists you can simply copy the list content over.

Let me know if you want that implementation too.

Pitto
  • 8,229
  • 3
  • 42
  • 51
  • with this method I will load every 32 image of first directory then the second directory etc I need the 32 image of each directory in my imgs array then the img 33-64 etc – Hadrien Berthier Mar 04 '19 at 10:59
0

Okay I found a way, not the most beautiful but here it is: I use a set to know which file I already seen and I continue if I'm on it so it doesn't count.

number_of_directory = 17
batch_size = 32
seen = set()
for overall_count in pbar(range(data_number // (batch_size * number_of_directory))):
    imgs = []
    for dirName, subdirList, fileList in os.walk(rootdir):
        count = 0
        for fname in sorted(fileList):
          if fname in seen:
            continue
          if count == batch_size:
            break
          img_path = os.path.join(dirName, fname)
          try:
            img = cv2.imread(img_path, cv2.IMREAD_COLOR)
            img = cv2.resize(img, (img_width, img_height))
            imgs.append(np.array(img))
          except Exception as e:
            print(str(e), fname)
          seen.add(fname)
          count +=1
    #Do something with images
Hadrien Berthier
  • 305
  • 1
  • 3
  • 17