-3

I'm trying to iterate over all directories in directory and find all .html files there. So far I've this code:

def find_path():
"""

:return: List
"""
paths = []
for filename in os.listdir(DIRECTORY):
    if filename.endswith('.html'):
        fname = os.path.join(DIRECTORY, filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(), 'html.parser')
            path = soup.select_one('#tree > li > span').contents[-1]
            paths.append(path)
return paths

But it only works if all .html files are in one directory. What I need is to iterate over all .html files in this directory and save it, but for every directory in that directory there are also .html files that I need to have access to. So ideally, I need to open all of these directories in my parent directory and save whatever I need from .html files. Is there a way to do it?

Thanks!

baduker
  • 19,152
  • 9
  • 33
  • 56
acolyter11
  • 29
  • 7

2 Answers2

1

os.walk() can help you

import os


def find_path(dir_):
    for root, folders, names in os.walk(dir_):
        for name in names:
            if name.endswith(".html"):
                # Your code
                pass

Lanbao
  • 666
  • 2
  • 9
  • thank you. When I try to do it like this, the "fname" variable I'm creating is giving me full path, and it's giving me an error - No such file or directory: .... But the files are there. Would you know how to resolve this? – acolyter11 May 27 '22 at 08:22
1

You can use the below sample snippet both #1 or #2 works:

import os
path = "."
for (root, dirs, files) in os.walk(path, topdown=True):
    for file in files:
        if file.endswith(".html"):
            print(root+"/"+file)                #1
            print(os.path.join(root+"/"+file))  #2
lazyBug
  • 99
  • 1
  • 5