2

I have the following directory structure with the following files:

Folder_One
├─file1.txt
├─file1.doc
└─file2.txt
Folder_Two
├─file2.txt
├─file2.doc
└─file3.txt

I would like to get only the .txt files from each folder listed. Example:

Folder_One-> file1.txt and file2.txt
Folder_Two-> file2.txt and file3.txt

Note: This entire directory is inside a folder called dataset. My code looks like this, but I believe something is missing. Can someone help me.

path_dataset = "./dataset/"
filedataset = os.listdir(path_dataset)
    
    for i in filedataset:
        pasta = ''
        pasta = pasta.join(i) 
        for file in glob.glob(path_dataset+"*.txt"):
            print(file)
simonica
  • 107
  • 1
  • 9

3 Answers3

5
from pathlib import Path

for path in Path('dataset').rglob('*.txt'):
    print(path.name)

Using glob

import glob
for x in glob.glob('dataset/**/*.txt', recursive=True):
    print(x)
bigbounty
  • 16,526
  • 5
  • 37
  • 65
2

You can use re module to check that filename ends with .txt.

import re
import os
path_dataset = "./dataset/"
l = os.listdir(path_dataset)

for e in l:
   if os.path.isdir("./dataset/" + e):
      ll = os.listdir(path_dataset + e)
      for file in ll:
          if re.match(r".*\.txt$", file):
              print(e + '->' + file)
ThunderPhoenix
  • 1,649
  • 4
  • 20
  • 47
0

One may use an additional option to check and find all files by using the os module (this is of advantage if you already use this module):

import os
#get current directory, you may also provide an absolute path
path=os.getcwd() 
#walk recursivly through all folders and gather information
for root, dirs, files in os.walk(path):
    #check if file is of correct type 
    check=[f for f in files if f.find(".txt")!=-1]
    if check!=[]:print(root,check)
Aroc
  • 1,022
  • 1
  • 10
  • 18