1


I need to perform some automated action on a set of XML files. I'm just learning Python so I've looked up for a similar SO answer and come up with this:

root_dir='/home/user/git/code'
for filename in glob.iglob(root_dir + '**/*.xml', recursive=True):
    print(filename)

The problem with the above code is that it finds just the top XML file which is on '/home/user/git/code' and not all those nested under that folder. The flag 'recursive' is set to true so I wonder what could be wrong with it.... Any idea? Thanks

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Carla
  • 3,064
  • 8
  • 36
  • 65
  • 2
    you forgot `/` between `code` and `**` so you have `code**` instead of `code/**` print(filename) – furas Nov 13 '19 at 11:39
  • 1
    check this https://stackoverflow.com/a/2186565/9563006 btw, per @furas, just changing `root_dir + '**/*.xml'` to `root_dir + '/**/*.xml'` should solve your issue – Piyush Singh Nov 13 '19 at 11:40
  • @furas you should post that as an answer instead of a comment ;) – exhuma Nov 13 '19 at 11:40
  • @exhuma when I was writing it in commend I was not sure if it resolves problem :). But I checked it now and I put it as answer. – furas Nov 13 '19 at 11:45

3 Answers3

2

You forgot / between code and ** so you have code** instead of code/**

You need / at the end

 root_dir='/home/user/git/code/'

or at the beginning in

'/**/*.xml'

OR use os.path.join() instead of +

os.path.join(root_dir, '**/*.xml')
furas
  • 134,197
  • 12
  • 106
  • 148
1

I use this function endlessly for my own projects. Hope it can serve you well.

import os, glob

def get_files(path, extension, recursive=False):
    """
    A generator of filepaths for each file into path with the target extension.
    If recursive, it will loop over subfolders as well.
    """
    if not recursive:
        for file_path in glob.iglob(path + "/*." + extension):
            yield file_path
    else:
        for root, dirs, files in os.walk(path):
            for file_path in glob.iglob(root + "/*." + extension):
                yield file_path

Example: my_desktop_pdfs = list(get_files('users/xx/Desktop','pdf'))

In your case:

for f in get_files(root_dir, 'xml', recursive=True):
    print(f)
alec_djinn
  • 10,104
  • 8
  • 46
  • 71
0

I don't know about glob.iglob but os.walk should produce the same result:

import os
for root, dirs, files in os.walk('/home/user/git/code'):
    for file in files:
        if (file.endswith('.xml')):
            print(file)
Ender Look
  • 2,303
  • 2
  • 17
  • 41