6

I am new to python and currently work on data analysis.

I am trying to open multiple folders in a loop and read all files in folders. Ex. working directory contains 10 folders needed to open and each folder contains 10 files.

My code for open each folder with .txt file;

file_open = glob.glob("home/....../folder1/*.txt")

I want to open folder 1 and read all files, then go to folder 2 and read all files... until folder 10 and read all files. Can anyone help me how to write loop to open folder, included library needed to be used?

I have my background in R, for example, in R I could write loop to open folders and files use code below.

folder_open <- dir("......./main/")
for (n in 1 to length of (folder_open)){
    file_open <-dir(paste0("......./main/",folder_open[n]))

    for (k in 1 to length of (file_open){
        file_open<-readLines(paste0("...../main/",folder_open[n],"/",file_open[k]))
        //Finally I can read all folders and files.
    }
}
nyr1o
  • 966
  • 1
  • 9
  • 23
Tiny_Y
  • 63
  • 1
  • 1
  • 5
  • 1
    Does [this](https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory) help? – GalAbra Mar 06 '18 at 16:48
  • None of the answers actually answer the question! The question is about a specific list of directories from folder1 to folder10, not all directories (of which there could be thousands). – PhilHibbs Jun 06 '22 at 10:19

6 Answers6

5

This recursive method will scan all directories within a given directory and then print the names of the txt files. I kindly invite you to take it forward.

import os

def scan_folder(parent):
    # iterate over all the files in directory 'parent'
    for file_name in os.listdir(parent):
        if file_name.endswith(".txt"):
            # if it's a txt file, print its name (or do whatever you want)
            print(file_name)
        else:
            current_path = "".join((parent, "/", file_name))
            if os.path.isdir(current_path):
                # if we're checking a sub-directory, recursively call this method
                scan_folder(current_path)

scan_folder("/example/path")  # Insert parent direcotry's path
GalAbra
  • 5,048
  • 4
  • 23
  • 42
2

Given the following folder/file tree:

C:.
├───folder1
│       file1.txt
│       file2.txt
│       file3.csv
│
└───folder2
        file4.txt
        file5.txt
        file6.csv

The following code will recursively locate all .txt files in the tree:

import os
import fnmatch

for path,dirs,files in os.walk('.'):
    for file in files:
        if fnmatch.fnmatch(file,'*.txt'):
            fullname = os.path.join(path,file)
            print(fullname)

Output:

.\folder1\file1.txt
.\folder1\file2.txt
.\folder2\file4.txt
.\folder2\file5.txt
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
1

Your glob() pattern is almost correct. Try one of these:

file_open = glob.glob("home/....../*/*.txt")
file_open = glob.glob("home/....../folder*/*.txt")

The first one will examine all of the text files in any first-level subdirectory of home/......, whatever that is. The second will limit itself to subdirectories named like "folder1", "folder2", etc.

I don't speak R, but this might translate your code:

for filename in glob.glob("......../main/*/*.txt"):
    with open(filename) as file_handle:
        for line in file_handle:
            # perform data on each line of text
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
0

I think nice way to do that would be to use os.walk. That will generate tree and you can then iterate through that tree.

import os
directory = './'
for d in os.walk(directory):
    print(d)
Reck
  • 1,388
  • 11
  • 20
Alex
  • 731
  • 1
  • 6
  • 21
0

This code will look for all directories inside a directory, printing out the names of all files found there:

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: print filenames one level down from starting folder
#--------*---------*---------*---------*---------*---------*---------*---------*

import os, fnmatch, sys

def find_dirs(directory, pattern):
    for item in os.listdir(directory):
        if os.path.isdir(os.path.join(directory, item)):
            if fnmatch.fnmatch(item, pattern):
                filename = os.path.join(directory, item)
                yield filename


def find_files(directory, pattern):
    for item in os.listdir(directory):
        if os.path.isfile(os.path.join(directory, item)):
            if fnmatch.fnmatch(item, pattern):
                filename = os.path.join(directory, item)
                yield filename



#--------*---------*---------*---------*---------*---------*---------*---------#
while True:#                       M A I N L I N E                             #
#--------*---------*---------*---------*---------*---------*---------*---------#
#                                  # Set directory
    os.chdir("C:\\Users\\Mike\\\Desktop")

    for filedir in find_dirs('.', '*'):
        print ('Got directory:', filedir)
        for filename in find_files(filedir, '*'):
            print (filename)

    sys.exit() # END PROGRAM      
CopyPasteIt
  • 532
  • 1
  • 8
  • 22
0

pathlib is a good choose

from pathlib import Path

# or use: glob('**/*.txt')
for txt_path in [_ for _ in Path('demo/test_dir').rglob('*.txt') if _.is_file()]:
    print(txt_path.absolute())
Carson
  • 6,105
  • 2
  • 37
  • 45