0

I'm messing around with * and **, and figuring out what the use-cases of these operators would be. For this "study", I wrote a function scandir_and_execute that traverses a directory (recursive by default) and executes a function exec_func on each file that is encountered. The function is variable, meaning when calling scandir_and_execute the programmer can indicate which function to run on every file. In addition, to figure out *, I added a func_args variable (defaults to an empty list) that can hold any number of argument.

The idea is that the programmer can use any exec_func that they have defined (or built-in) to which the file is the first argument, and that they provide the needed arguments themselves, in a list, which is then expanded on the exec_func call.

Note: at least Python 3.5 is required to run this function.

import os

def scandir_and_execute(root, exec_func, func_args=[], recursive=True, verbose=False):
    if verbose:
        print(f"TRAVERSING {root}")

    # Use scandir to return iterator rather than list
    for entry in os.scandir(root):
        if entry.is_dir() and not entry.name.startswith('.'):
            if recursive:
                scan_and_execute(entry.path, exec_func, func_args, True, verbose)
        elif entry.is_file():
            if verbose:
                print(f"\tProcessing {entry.name}")

            # Unpack (splat) argument list, i.e. turn func_args into separate arguments and run exec_func
            exec_func(entry.path, *func_args)

Is this the correct way to use *, or am I misinterpreting the documentation and the concept of the operator? The function works, as far as I have tested it, but perhaps there are some caveats or non-pythonic things that I did? For instance, would it be better to write the function like this where the unnamed "superfluous" arguments are tupled together (or another way)?

def scandir_and_execute(root, exec_func, recursive=True, verbose=False, *func_args):
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • 2
    Note that `func(args=[])` is not the same as `func(*args)`. You'd call the first as `func([a, b, c])` and the latter as `func(a, b, c)`. Also, note that using `[]` as a default is not good practice (but probably not a problem in this case) as that will be _the same_ list instance each time you call the function. – tobias_k Jan 12 '18 at 09:22
  • @tobias_k I was aware of the first comment, not of the second! Could you give me any pointers what to change, and where I can find more information on this? – Bram Vanroy Jan 12 '18 at 09:28
  • @BramVanroy https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument – Chris_Rands Jan 12 '18 at 09:32
  • using `*` is the same as unpacking iterable objects in passing parameters. for example, `a = [1, 2, 3]`, `*a => 1, 2, 3`. – Sraw Jan 12 '18 at 09:32
  • 1
    I’m thinking this is a C-ism? You should drop `func_args` and let people pass in `exec_func=lambda x: foo(x, …)` if they need that. – Ry- Jan 12 '18 at 09:34
  • @Ryan Not sure that I follow. The idea of `exec_func` is that it's executed for every file *with the file path as the first argument*. Not sure how I'd specify that given your comment. Could you perhaps elaborate in an answer? – Bram Vanroy Jan 12 '18 at 09:40

1 Answers1

1

That is how you use the splat operator, but consider whether it needs to be your function’s responsibility to pas arguments at all. Say you’re using it like this now:

scandir_and_execute(root, foo, (foo_arg1, foo_arg2), recursive=True)

you can rewrite scandir_and_execute to accept a callable taking one argument:

def scandir_and_execute(root, exec_func, recursive=True, verbose=False):
    if verbose:
        print(f"TRAVERSING {root}")

    # Use scandir to return iterator rather than list
    for entry in os.scandir(root):
        if entry.is_dir() and not entry.name.startswith('.'):
            if recursive:
                scandir_and_execute(entry.path, exec_func, True, verbose)
        elif entry.is_file():
            if verbose:
                print(f"\tProcessing {entry.name}")

            exec_func(entry.path)

and let the caller handle its business:

scandir_and_execute(root, lambda path: foo(path, foo_arg1, foo_arg2))

Then drop the callback entirely and make a generator:

def scandir(root, recursive=True, verbose=False):
    if verbose:
        print(f"TRAVERSING {root}")

    # Use scandir to return iterator rather than list
    for entry in os.scandir(root):
        if entry.is_dir() and not entry.name.startswith('.'):
            if recursive:
                yield from scandir(entry.path, True, verbose)
        elif entry.is_file():
            if verbose:
                print(f"\tProcessing {entry.name}")

            yield entry.path
for path in scandir(root, recursive=True):
    foo(path, foo_arg1, foo_arg2)

(Close to walk, but not quite!) Now the non-recursive version is just this generator:

(entry.path for entry in os.scandir(root) if entry.is_file())

so you may as well provide only the recursive version:

import os


def is_hidden(dir_entry):
    return dir_entry.name.startswith('.')


def scandir_recursive(root, *, exclude_dir=is_hidden):
    for entry in os.scandir(root):
        yield entry

        if entry.is_dir() and not exclude_dir(entry):
            yield from scandir_recursive(entry.path, exclude_dir=exclude_dir)
import logging

logging.info(f'TRAVERSING {root}')

for entry in scandir_recursive(root):
    if entry.is_dir():
        logging.info(f'TRAVERSING {entry.path}')
    elif entry.is_file():
        logging.info(f'\tProcessing {entry.name}')
        foo(entry.path, foo_arg1, foo_arg2)
Ry-
  • 218,210
  • 55
  • 464
  • 476
  • I understand what you're saying, thanks. For me, still getting my head around generators and `yield`, I prefer the readability of the first example (with the lambda expression). Would there be a performance gain using a generator (your example) in comparison to my initial function? – Bram Vanroy Jan 12 '18 at 11:32
  • @BramVanroy: No performance difference; the separated/generator version is just for improved readability and flexibility. – Ry- Jan 12 '18 at 12:22