Browse files and subfolders in Python

Question

I'd like to browse through the current folder and all its subfolders and get all the files with .htm|.html extensions. I have found out that it is possible to find out whether an object is a dir or file like this:

import os

dirList = os.listdir("./") # current directory
for dir in dirList:
  if os.path.isdir(dir) == True:
    # I don't know how to get into this dir and do the same thing here
  else:
    # I got file and i can regexp if it is .htm|html

and in the end, I would like to have all the files and their paths in an array. Is something like that possible?

possible duplicate of [How to traverse through the files in a directory?](http://stackoverflow.com/questions/4918458/how-to-traverse-through-the-files-in-a-directory) — S.Lott, Apr 28 '11 at 11:12

Sven Marnach · Accepted Answer · 2018-06-01T17:06:26.590

164

You can use os.walk() to recursively iterate through a directory and all its subdirectories:

for root, dirs, files in os.walk(path):
    for name in files:
        if name.endswith((".html", ".htm")):
            # whatever

To build a list of these names, you can use a list comprehension:

htmlfiles = [os.path.join(root, name)
             for root, dirs, files in os.walk(path)
             for name in files
             if name.endswith((".html", ".htm"))]

edited Jun 01 '18 at 17:06

answered Apr 28 '11 at 10:35

Sven Marnach

574,206
118
941
841

4

I think some nuances worth mentioning are that it will traverse/include hidden files, and that this also doesn't resolve links for you. It's also not guaranteed that every file/directory enumerated will exist (mostly due to the fact a link can exist, but its target may not). [Some further reading](https://docs.python.org/2.7/library/os.html#os.readlink) about resolving links might be helpful to some, depending on how you intend to use `os.walk`. – Jun 01 '18 at 16:33

score 18 · Answer 2 · answered Jul 11 '18 at 05:23

18

I had a similar thing to work on, and this is how I did it.

import os

rootdir = os.getcwd()

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        #print os.path.join(subdir, file)
        filepath = subdir + os.sep + file

        if filepath.endswith(".html"):
            print (filepath)

Hope this helps.

answered Jul 11 '18 at 05:23

Pragyaditya Das

1,648
6
25
44

1

@Pragyaditya_Das, brilliant! – Mark K Feb 19 '19 at 06:19

Spas · Answer 3 · 2022-12-15T17:56:11.350

8

In python 3 you can use os.scandir():

def dir_scan(path):
    for i in os.scandir(path):
        if i.is_file():
            print('File: ' + i.path)
        elif i.is_dir():
            print('Folder: ' + i.path)
            dir_scan(i.path)

edited Dec 15 '22 at 17:56

answered Sep 12 '18 at 15:12

Spas

840
16
13

1

This answer is not very appropriate, because `os.scandir()` does NOT iterate through all the subfolders (as requested in the question). `os.walk()` is better, even in Python 3, as in the accepted answer. – Dan Stowell Dec 04 '22 at 17:50
@DanStowell you were right that it wasn't going through the subfolders I changed my answer so it loops through every file in every subfolder. os.scandir() is supposed to be faster than os.walk() - https://peps.python.org/pep-0471/ – Spas Dec 15 '22 at 17:58

score 5 · Answer 4 · edited Feb 17 '17 at 00:08

Use newDirName = os.path.abspath(dir) to create a full directory path name for the subdirectory and then list its contents as you have done with the parent (i.e. newDirList = os.listDir(newDirName))

You can create a separate method of your code snippet and call it recursively through the subdirectory structure. The first parameter is the directory pathname. This will change for each subdirectory.

This answer is based on the 3.1.1 version documentation of the Python Library. There is a good model example of this in action on page 228 of the Python 3.1.1 Library Reference (Chapter 10 - File and Directory Access). Good Luck!

score 0 · Answer 5 · answered Jan 05 '14 at 21:14

0

Slightly altered version of Sven Marnach's solution..


import os

folder_location = 'C:\SomeFolderName'
file_list = create_file_list(folder_location)

def create_file_list(path):
    return_list = []

for filenames in os.walk(path):
    for file_list in filenames:
        for file_name in file_list:
            if file_name.endswith((".txt")):
                return_list.append(file_name)

return return_list

answered Jan 05 '14 at 21:14

campervancoder

1,579
2
11
15

For some reason there are extra spaces and the for block indentation is not right in the above paste.. SO's markup does not like me.. – campervancoder Jan 05 '14 at 21:17
3

Poor rework of simple code - replacing tuple assignment with embedded loops makes code less readable, and probably less efficient too – volcano Jan 05 '14 at 21:23
Thanks for the comment @volcano.. The example above did not seem to work hence the additional for loop.. – campervancoder Jan 07 '14 at 19:48

Yi2021 · Answer 6 · 2022-07-24T08:53:16.800

0

There are two ways works for me.

1. Work with the `os` package and use `'__file__'` to replace the main 
directory when the project locates

import os
script_dir = os.path.dirname(__file__)      

path = 'subdirectory/test.txt'
file = os.path.join(script_dir, path)
fileread = open(file,'r') 


2. By using '\\' to read or write the file in subfolder 
fileread = open('subdirectory\\test.txt','r')

edited Jul 24 '22 at 08:53

answered Jul 23 '22 at 10:50

Yi2021

1
2

Do not paste the same [answer](https://stackoverflow.com/a/73091606/7758804) to multiple questions. This has been flagged to a moderator. – Trenton McKinney Jul 23 '22 at 17:05

score -1 · Answer 7 · answered Jan 03 '21 at 12:35

-1

from tkinter import *
import os

root = Tk()
file = filedialog.askdirectory()
changed_dir = os.listdir(file)
print(changed_dir)
root.mainloop()

answered Jan 03 '21 at 12:35

Akshat Mishra

1

Browse files and subfolders in Python

7 Answers7

Linked

Related