0

A path is given and this path contains sub directories like

---- Folder1
---- Folder2 
---- Folder3
etc.

I want to extract from each folder the first file and store it in a list and if possible remove that file also from that folder. I know how to do this by looping through folders and the respective files using os.listdir method but this process loads all the files in the memory.

And considering there are large number of files in each folder does anyone know an efficient way or method to loop through folders and extract the first file of each sub folder into a list.?

  • Take a look at [path.py](https://pathpy.readthedocs.io/en/stable/api.html), that could help (note you'll have to install this library with `pip`, this is not a builtin) – olinox14 May 16 '19 at 09:04
  • 2
    Maybe [this](https://stackoverflow.com/questions/25550919/listing-files-in-a-directory-with-python-when-the-directory-is-huge) would be helpful – Zionsof May 16 '19 at 09:07
  • https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory might help. – Basya May 16 '19 at 09:07
  • What is the first file here? Or more exactly what is the order? Lexicographic, modification time, the first in the folder internal structure, ...? – Serge Ballesta May 16 '19 at 09:10
  • @SergeBallesta the first file in each sub-directory is an image file. – Sayooj Balakrishnan May 17 '19 at 08:16

1 Answers1

1

You can iterate through all subfolders of the given folder with os.walk function. Then you can extract each first file you need. Here is the example for first lexicographical files:

import os

result_files = []
for root, dirs, files in os.walk(your_folder):
    if files:
        result_files.append(sorted(files)[0])

Each tuple in os.walk contains:

  • Current folder name
  • All folders in the current folder
  • All files in the current folder
vurmux
  • 9,420
  • 3
  • 25
  • 45