I want to start by pointing out that this kind of (recurring) symlinking is a sign of bad design. Any fix would be fixing the effect of the problem not the cause ("sweeping the dirt under the carpet").
Unfortunately, (the recurring) glob doesn't allow filtering, nor does it provide access to elements while enumerating them. So, you need another way, by enumerating the dir elements yourself (using one of many existing ways - you could take a look at [SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer)) and filter out the unwanted ones.
Here's the test dir structure. Note that here, the 2 recurring symlinks are actually normal dirs, otherwise they would have messed up the command (which doesn't handle this case either). I replaced them by symlinks afterwards:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q057591233]> tree /a /f .
Folder PATH listing for volume Work
Volume serial number is 3655-6FED
E:\WORK\DEV\STACKOVERFLOW\Q057591233
| code00.py
|
+---external_dir
| file00.xml
|
\---search_dir
| file00.xml
| file01.xml
|
+---dir00
| +---dir00
| | | file00.xml
| | |
| | \---dir00
| | file00.xml
| |
| \---dir01_symlink_to_parent_sibbling_dir01
\---dir01
+---dir00_symlink_to_parent_sibbling_dir00
+---dir01
| file00.xml
|
\---dir02_symlink_to_external_dir
file00_ext.xml
code00.py:
#!/usr/bin/env python3
import sys
import os
import re
import pprint
def _get_files_os_scandir_no_symlikns(dir_name, match_func, level=0):
for item in os.scandir(dir_name):
if item.is_symlink():
continue
if item.is_dir():
yield from _get_files_os_scandir_no_symlikns(item.path, match_func, level=level + 1)
elif match_func(item.path):
yield item.path
def _get_files_os_scandir(dir_name, match_func, visited_inodes, level=0):
for item in os.scandir(dir_name):
if item.inode() in visited_inodes:
continue
visited_inodes.append(item.inode())
item_path = os.path.normpath(os.path.join(*os.path.split(item.path)[:-1], os.readlink(item.path))) if item.is_symlink() else item.path
if item.is_dir():
yield from _get_files_os_scandir(item_path, match_func, visited_inodes, level=level + 1)
elif match_func(item_path):
yield item_path
def get_files(path, ext, exclude_symlinks=True):
if exclude_symlinks and os.path.islink(path):
return
pattern = re.compile(".*\.{0:s}$".format(ext))
if os.path.isdir(path):
if exclude_symlinks:
yield from _get_files_os_scandir_no_symlikns(path, pattern.match)
else:
yield from _get_files_os_scandir(path, pattern.match, list())
elif os.path.isfile(path) and pattern.match(path):
yield path
def main():
search_dir = "search_dir"
extension = "xml"
for exclude_symlinks in [True, False]:
print("\nExclude symlinks: {0:}".format(exclude_symlinks))
files = list(get_files(search_dir, extension, exclude_symlinks=exclude_symlinks))
pprint.pprint(files)
print("Total items: {0:d}".format(len(files)))
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main()
print("\nDone.")
Output:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q057591233]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32
Exclude symlinks: True
['search_dir\\dir00\\dir00\\dir00\\file00.xml',
'search_dir\\dir00\\dir00\\file00.xml',
'search_dir\\dir01\\dir01\\file00.xml',
'search_dir\\file00.xml',
'search_dir\\file01.xml']
Total items: 5
Exclude symlinks: False
['search_dir\\dir00\\dir00\\dir00\\file00.xml',
'search_dir\\dir00\\dir00\\file00.xml',
'search_dir\\dir01\\dir01\\file00.xml',
'external_dir\\file00_ext.xml',
'search_dir\\file00.xml',
'search_dir\\file01.xml']
Total items: 6
Done.
Notes:
- The recursive implementation relies on [Python 3.Docs]: os.scandir(path='.') (and other file / dir functions)
- In terms of file name matching, there's no wildcards support, so the closest (?) thing (regexp) is used
- The 2 functions traversing the dir:
- _get_files_os_scandir_no_symlikns - ignores all symlinks
- _get_files_os_scandir - includes symlinks. Also does some processing to avoid infinite recursion and for symlink resolution
- The 2 functions could have been unified (with an extra argument (exclude_symlinks)), but I got a feeling that the one ignoring them performs much faster this way
- As seen, none enters infinite recursion (for the former it's obvious), but the former also omits the file external to the search dir
- get_files_os_scandir - a wrapper that calls either one of the 2, after it does some initialization work (to avoid doing it by each recurring call)
- I only ran the code on Win, but I ran parts of it on Nix as well, so I'm not expecting any surprises there