Issue: *.ext
is not a valid path
*.ext
is not a valid path, but a glob, a kind of pattern to match or find files.
Preferred Solution
Pass the specified files (either as glob-ed expression, or as concrete file-path) directly to a suitable method, that can either expand the path-pattern (glob) or resolve the concrete path.
Pathlib with glob
method
Since you already imported and use Pathlib you could use its glob
method like this:
from pathlib import Path
paths = list(Path('.').glob('*.txt'))
# [PosixPath('test.txt'), PosixPath('production.txt')]
for path in paths:
with path.open() as file:
text = file.read()
The resulting output from comment-line assumes, there are two .txt
files in your current directory denoted by .
.
Note:
You could also pass relative path-expressions to glob
like logs/*.txt
or even **/*.txt
which will math the files in all sub-directories recursively (denoted by **
).
What if a user passes a concrete file-path?
Consider, that user might directly pass concrete file-names as arguments. You should test if glob
function can deal with it.
If not, you would have to validate for it and select a different path-finder for these cases.
Alternative: Pure globs (jokers, wildcards) in python
Underneath most of these globbing modules (like pathlib) might use Python's pure glob module. This is how it could work here, too:
import glob
filenames = glob.glob('logs/*.txt')
# ['logs/test.txt', 'logs/production.txt']
See also:
Using File Extension Wildcards in os.listdir(path)
But as Charlie G adviced introducing another module is not necessary here when Pathlib could do the trick (globbing).
Handle file-name patterns in command-arguments
When passing a file-name pattern like logs/*.txt
via the command-line, you should treat each argument separately.
For example a program call from console/shell like:
python script_name.py logs/*.txt
would work like this:
from pathlib import Path
if __name__ == "__main__":
# the first element (with index 0) is the program called
path_patterns = sys.argv[1:] # get all arguments as list by slicing
print('got arguments:', path_expressions)
for pattern in path_patterns:
paths = Path.cwd().glob(pattern)
print('file-pattern: ', pattern, 'globbed to paths: ', paths)
Note: it is important that glob
method requires a single pattern as string ( type str
), not a list.
If you pass a list to the method like glob(path_patterns)
you will get an error like:
TypeError: expected str, bytes or os.PathLike object, not list
Your sys.argv[1:]
uses slicing to get all arguments passed on the command-line. So the resulting list could contain 0, 1 or multiple elements.
Validate command line arguments
If you only require 1 single argument (the "globbed" file-path) then use path_pattern = sys.argv[1]
.
Furthermore it would be good style and defensive programming to check for the number of arguments before (to avoid an out-of-bounds exception).
This could be done like this:
# guard-statement testing for required number of arguments (program + 1 = 2)
if len(sys.argv) < 2:
print('Requires at least a single argument, the file-path!')
print('Usage: python script_name.py <file-path>')
print('Example: python script_name.py logs/*.txt')
sys.exit()
# continue because here you are sure at least 1 argument exists
print('got at least 1 required argument: ', sys.argv[1:])
See also: