2

I have the following script to search for files with a particular extension, in Python

  13 def list_xml_reports(d, extensions):
  14     matches = []
  15     print extensions
  16     for root, dirname, filenames in os.walk(d):
  17         for extension in extensions:
  18             for filename in fnmatch.filter(filenames, extension):
  19                 matches.append(os.path.join(root, filename))
  20     return set(matches)
  21 
  22 print list_xml_reports("/root/my_dir", ("*junitReport.xml"))

However, instead of returning me only the files that end with *junitReport.xml that's returning me everything (*.log, *build.xml, *.txt, *changelog.xml) ... Why is that happening?

cybertextron
  • 10,547
  • 28
  • 104
  • 208

2 Answers2

3

I would use python's module os: I have my files in directory: /Users/ach/Documents/Data_Science/img/

Version 1:

Elegant version:

import os
DIR_WITH_FILES='/Users/ach/Documents/Data_Science/img'
list_png=[f for f in sorted(os.listdir(DIR_WITH_FILES)) if (str(f))[-3:] == "png"]
list_png_with_path=[DIR_WITH_FILES+'/'+str(f) for f in list_png]  

Version 2:

Step by Step:

code_1: Import os and Give the path of the directory where your files are located

import os

print(os.getcwd())
MY_WORKING_DIRECTORY=os.getcwd()
!ls
DIR_WITH_FILES=MY_WORKING_DIRECTORY+'/img/'
print(DIR_WITH_FILES)

output_1:

Users/ach/Documents/Data_Science
img                                notebooks
/Users/ach/Documents/Data_Science/img/

code_2: List the files of the directory DIR_WITH_FILES=/Users/ach/Documents/Data_Science/img

files=sorted(os.listdir(DIR_WITH_FILES))
print(files)

output_2:

['fig_0.png', 'fig_describe_target_0.png', 'fig_describe_target_1.png', 'test.txt']

code_3:Create two lists with the names of the files with extension png => list_png, and also another list with the name of the files with their paths => list_png_with_path:

 list_png=[]
 list_png_with_path=[]

 for f in sorted(os.listdir(DIR_WITH_FILES)):
     if (str(f))[-3:] == "png":
        list_png.append(f)
         png_with_path=DIR_WITH_FILES+str(f)
         list_png_with_path.append(png_with_path)
    
 print(list_png)
 print(list_png_with_path)

output_3:

['fig_0.png', 'fig_describe_target_0.png', 'fig_describe_target_1.png']

['/Users/ach/Documents/Data_Science/img/fig_0.png',  '/Users/ach/Documents/Data_Science/img/fig_describe_target_0.png', '/Users/ach/Documents/Data_Science/img/fig_describe_target_1.png']

You can see the file 'test.txt' is not included in the lists.

If you want to filter several extensions just loop over the code_3

You can check also: Get a filtered list of files in a directory

Community
  • 1
  • 1
pink.slash
  • 1,817
  • 15
  • 14
2

The expression ("*junitReport.xml") is the same as "*junitReport.xml". Either add a comma ("*junitReport.xml",) or turn it into a list ["*junitReport.xml"].

What happened is that extensions received the value "*junitReport.xml" and when you looped over it, extension took the values '*', 'j', 'u', ....

The first value, '*', matched everything.

Brent Washburne
  • 12,904
  • 4
  • 60
  • 82