1

Please consider this python code:

from pathlib import Path

def main():
    filenames = sys.argv[1:]

    for filename in filenames:
        path = Path(filename)
        with path.open() as file:
            text = file.read()
            a = json.loads(text)

if __name__ == "__main__":
    main()

This script works fine on Linux called:

python script_name.py logs/*.txt

But on Windows, Anaconda Powershell returns an error:

python script_name.py logs/*.txt
OSError: [Errno 22] Invalid argument: 'logs\\*.txt'

So how call on Windows a script with argument which is a filename joker (*.txt)?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Theo75
  • 477
  • 4
  • 14

1 Answers1

2

Issue: *.ext is not a valid path

*.ext is not a valid path, but a glob, a kind of pattern to match or find files.

Preferred Solution

Pass the specified files (either as glob-ed expression, or as concrete file-path) directly to a suitable method, that can either expand the path-pattern (glob) or resolve the concrete path.

Pathlib with glob method

Since you already imported and use Pathlib you could use its glob method like this:

from pathlib import Path

paths = list(Path('.').glob('*.txt'))
# [PosixPath('test.txt'), PosixPath('production.txt')]
for path in paths:
    with path.open() as file:
        text = file.read()

The resulting output from comment-line assumes, there are two .txt files in your current directory denoted by ..

Note: You could also pass relative path-expressions to glob like logs/*.txt or even **/*.txt which will math the files in all sub-directories recursively (denoted by **).

What if a user passes a concrete file-path?

Consider, that user might directly pass concrete file-names as arguments. You should test if glob function can deal with it.

If not, you would have to validate for it and select a different path-finder for these cases.

Alternative: Pure globs (jokers, wildcards) in python

Underneath most of these globbing modules (like pathlib) might use Python's pure glob module. This is how it could work here, too:

import glob

filenames = glob.glob('logs/*.txt')
# ['logs/test.txt', 'logs/production.txt']

See also: Using File Extension Wildcards in os.listdir(path)

But as Charlie G adviced introducing another module is not necessary here when Pathlib could do the trick (globbing).

Handle file-name patterns in command-arguments

When passing a file-name pattern like logs/*.txt via the command-line, you should treat each argument separately.

For example a program call from console/shell like:

python script_name.py logs/*.txt 

would work like this:

from pathlib import Path

if __name__ == "__main__":
    # the first element (with index 0) is the program called
    path_patterns = sys.argv[1:]  # get all arguments as list by slicing
    print('got arguments:', path_expressions)

    for pattern in path_patterns:
        paths = Path.cwd().glob(pattern)
        print('file-pattern: ', pattern, 'globbed to paths: ', paths)

Note: it is important that glob method requires a single pattern as string ( type str), not a list. If you pass a list to the method like glob(path_patterns) you will get an error like:

TypeError: expected str, bytes or os.PathLike object, not list

Your sys.argv[1:] uses slicing to get all arguments passed on the command-line. So the resulting list could contain 0, 1 or multiple elements.

Validate command line arguments

If you only require 1 single argument (the "globbed" file-path) then use path_pattern = sys.argv[1].

Furthermore it would be good style and defensive programming to check for the number of arguments before (to avoid an out-of-bounds exception).

This could be done like this:

# guard-statement testing for required number of arguments (program + 1 = 2)
if len(sys.argv) < 2:
    print('Requires at least a single argument, the file-path!')
    print('Usage: python script_name.py <file-path>')
    print('Example: python script_name.py logs/*.txt')
    sys.exit()

# continue because here you are sure at least 1 argument exists
print('got at least 1 required argument: ', sys.argv[1:])

See also:

hc_dev
  • 8,389
  • 1
  • 26
  • 38
  • 2
    Using `pathlib.Path.glob` should be the accepted answer rather than bringing in a separate library as it will handle OS switches natively. OP might have to do some validation of the strings brought in to determine whether wildcards are present in each item of `sys.argv[1:]`, but I think minimizing imports should be preferred. – Charlie G Sep 15 '21 at 18:01
  • 2
    Actually, you can pass each element of `sys.argv[1:]` right into `pathlib.Path.glob` since a direct match of the pattern without a wildcard should return either one or no paths. – Charlie G Sep 15 '21 at 18:05
  • 1
    @CharlieG Thanks for bewaring. I first came up with raw use of `glob` module, then I found the glob-feature already exists in `pathlib`. Also agree with the direct-match for concrete-file arguments (without wildcards in them). – hc_dev Sep 15 '21 at 19:20
  • So with this instructions: filenames = glob.glob(sys.argv[1:]) I got error: filenames = glob.glob(sys.argv[1:]) File "C:\ProgramData\Anaconda3\lib\glob.py", line 21, in glob return list(iglob(pathname, recursive=recursive)) File "C:\ProgramData\Anaconda3\lib\glob.py", line 42, in _iglob dirname, basename = os.path.split(pathname) File "C:\ProgramData\Anaconda3\lib\ntpath.py", line 185, in split p = os.fspath(p) TypeError: expected str, bytes or os.PathLike object, not list – Theo75 Sep 15 '21 at 19:32
  • 1
    @Theo75 The error-message suggests the fix: Use a sting instead of a list `glob.glob(sys.argv[1])` passes a single string, whereas `sys.argv[1:]` is [Python's list slicing](https://realpython.com/lessons/indexing-and-slicing/) returning a list. See my update to handle command-line args. – hc_dev Sep 16 '21 at 08:01