-1

I wanted to make os.walk based list comprehension (latter to be turned into a generator expression) with full control, but couldn't find a way. Have I hit the limit of Python's expressive power or is my vocabulary of tricks just lacking ? :-)

Here's what I've got and even for that I had to hack a bit (short circuit logical expression and forced True) because I could find a way to inject anything with assignment operator. No matter how much I tried to bury it, parser would find me :-) Also couldn't inject print anywhere (as a debugging aid).

[( s, re.sub(r"^.*?FStar\.(.*)\.fs(.*)", dest_dir + r"Fstar.\1.fst\2", s) )
     for s in ( x[0].replace('\\', '/') + "/" + f
        for x in os.walk(".")
            if (skip_dir in x[1] and x[1].remove(skip_dir) or True)
            for f in x[2] if fnmatch(f ,patt))
]

What I wanted was full control of dirs along the lines of:

x[1][:]=[d for d in x[1] if d not in skip_list]

And that got me thinking about the limits of Python's expressiveness.

Any ideas for this particular case appreciated, but also if someone knows/has an example of a construct offering even more potential control than os.walk only to be faced with limits of comprehension nesting/combining.

Also I started wishing for pipes or other form of generator combinators while writing this.

The ultimate limits of expressiveness in a comprehension structure (and in general in syntactic constructs) is the bigger question here.

For those asking for clarification - the first question is how to do achieve control of os.walk in a comprehension, to be able to remove multiple items (coming from skip_list) and not just one, the second is what are the limits of Pythons expressive power - what else, could be done within a nested comprehension structure. Assignments for example.

ZXX
  • 4,684
  • 27
  • 35
  • I'm not sure I understand the question but complex nested comprehensions are not pythonic anyway - Python's philosophy is about readability first, and your example snippet is hardly readable even for a long time Python user. – bruno desthuilliers Jun 07 '17 at 09:18
  • can you give us example of input/output? – Azat Ibrakov Jun 07 '17 at 09:18
  • also, use generator functions instead of comprehensions if you need something complex – Azat Ibrakov Jun 07 '17 at 09:19
  • This would get thrown out under any code review practice out there. Use several generator functions and `filter()` – user2722968 Jun 07 '17 at 09:20
  • `(skip_dir in x[1] and x[1].remove(skip_dir) or True)` <- what this line supposed to do? – Azat Ibrakov Jun 07 '17 at 09:29
  • @brunodesthuilliers I had to read it thrice to grok it and still no idea what this question is... sounds more like a "I want to do a one-liner that no-one will understand, doesn't let me debug it etc..." – Jon Clements Jun 07 '17 at 10:48

1 Answers1

1

All of Python users should be familiar with Zen.

In this case I can see violation of next rules

  • Readability counts.

    given example is too hard to reason about.

  • Simple is better than complex.

    Sparse is better than dense.

    there are too many statements in one comprehension.

  • Flat is better than nested.

    and we've got nested comprehensions here.

Overview

  • side effects inside of comprehensions is not a good idea

    x[1].remove(skip_dir)
    

    see discussion here

  • instead of string concatenation

    x[0].replace('\\', '/') + "/" + f
    

    we use os.path.join

    os.path.join(x[0].replace('\\', '/'), f)
    
  • about statement

    for x in os.walk(".") 
    

    os.walk yields tuples with names of root directory, subdirectories and files, so it will be better to unpack tuple instead of accessing coordinates by index

    for root, directories_names, files_names in os.walk('.')
    
  • if we are using the same regular expression many times then it deserves to be compiled before using, so

    ... re.sub(r"^.*?FStar\.(.*)\.fs(.*)", dest_dir + r"Fstar.\1.fst\2", s) ...
    

    can be separated into

    TARGET_FILES_NAMES_RE = re.compile(r"^.*?FStar\.(.*)\.fs(.*)")
    
    ... TARGET_FILES_NAMES_RE.sub(dest_dir + r"Fstar.\1.fst\2", s) ...
    

    also it is unlear what should dest_dir + r"Fstar.\1.fst\2" do: i believe that it should join dest_dir to simplified file name.

Main idea

When comprehension becomes complex (or especially complicated) it is a good idea to rewrite it into generator function.

For given example it may be like

TARGET_FILES_NAMES_RE = re.compile(r"^.*?FStar\.(.*)\.fs(.*)")


def modify_paths(top, dest_dir, skip_dir, pattern):
    replacement = os.path.join(dest_dir, r"Fstar.\1.fst\2")
    for root, directories_names, files_names in os.walk(top):
        try:
            # we are removing `skip_dir` from all subdirectories,
            # is it a desired behavior?
            directories_names.remove(skip_dir)
        except ValueError:
            # not in list
            pass

        for file_name in files_names:
            if not fnmatch(file_name, pattern):
                continue
            s = os.path.join(root.replace('\\', '/'), file_name)
            yield (s,
                   TARGET_FILES_NAMES_RE.sub(replacement, s))

but it is still raw and should be refactored.

Test

I've created directory

/
    doc.txt
    FStar.anotherfs
    FStar.other.fS
    FStar.some.fs90
    FStar.text..fs
    test.py

    skip_me/
            FStar.file.fs
            FStar.sample.fs
            FStar.skipped.fs

where test.py contents:

import os
import re
from fnmatch import fnmatch

TARGET_FILES_NAMES_RE = re.compile(r"^.*?FStar\.(.*)\.fs(.*)")


def modify_paths(top, dest_dir, skip_dir, pattern):
    replacement = os.path.join(dest_dir, r"Fstar.\1.fst\2")
    for root, directories_names, files_names in os.walk(top):
        try:
            # we are removing `skip_dir` from all subdirectories,
            # is it a desired behavior?
            directories_names.remove(skip_dir)
        except ValueError:
            # not in list
            pass

        for file_name in files_names:
            if not fnmatch(file_name, pattern):
                continue
            s = os.path.join(root.replace('\\', '/'), file_name)
            yield (s,
                   TARGET_FILES_NAMES_RE.sub(replacement, s))


if __name__ == '__main__':
    top = '.'
    skip_dir = 'skip_me'
    patt = '*'
    # slash is required for version with list comprehension
    dest_dir = 'dest_dir/'
    before = [
        (s,
         re.sub(r"^.*?FStar\.(.*)\.fs(.*)", dest_dir + r"Fstar.\1.fst\2", s))
        for s in (os.path.join(x[0].replace('\\', '/'), f)
                  for x in os.walk(top)
                  if (skip_dir in x[1] and
                      x[1].remove(skip_dir) or True)
                  for f in x[2] if fnmatch(f, patt))]
    after = list(
        modify_paths(top=top, dest_dir=dest_dir, skip_dir=skip_dir,
                     pattern=patt))
    assert after == before

and assertion is successful.

P. S.

Skipping exceptions is not a good idea but if you know what to expect it can become a powerful tool.

Also it can be re-written using contextlib.suppress context manager from

try:
    directories_names.remove(skip_dir)
except ValueError:
    pass

to

with suppress(ValueError):
    directories_names.remove(skip_dir)
Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50
  • Yeah you can always write a function and to in it whatever you like. But in doing so you've missed the whole point of the question. And for what? To dump out endless diatribe on the subject of "I hate compact compositions, I love long files and functions and making people read 100 lines of code and go hunting my imports/references". I'm sure you'd eventually end up with a whole package for a 4-liner :-) These kinds of constructs get constructed from the bottom up - the top level is just example of what can be composed. – ZXX Jun 07 '17 at 18:32
  • @ZXX: your example is not 4-liner at all, writing smallest scripts possible is not what Python about, we're not writing "whole package", just 1 function instead of unreadable messy list comprehension – Azat Ibrakov Jun 07 '17 at 18:36
  • Go back and count how many lines of code you've dumped here and how many lines of preachy prose. The only thing you have doing here is expounding your arrogance and totalitarian thinking. If you don't like particular style, decent way is not to respond. Instead you hijack a question and try to start a flame war. BTW you haven't answered the question at all. Show your great "expertise" - solve the problem. Much harder than badmouthing, isn't it? At the very least try to understand why was that trailing True needed. Too much to ask I suppose. – ZXX Jun 08 '17 at 00:29
  • @ZXX: 14 lines (+ 2 empty lines between object and function definition and 3 lines of comments since i don't understand what you are trying to achieve) of more or less readable code vs 6 lines of unpredictable chaos, i prefer readability – Azat Ibrakov Jun 08 '17 at 00:37
  • @ZXX: if you work totally alone/hate your co-workers, than just fine, write as small and dense code as possible, but we are writing code not only to use it, but read it, to make it simple for readers to understand what is going on, to make it easier to improve, because world is dynamic and code should be changeable to satisfy new requirements – Azat Ibrakov Jun 08 '17 at 00:44
  • @ZXX: you are the only one arrogant here, i'm just trying to help rethink given approach, because it is a language misusing – Azat Ibrakov Jun 08 '17 at 00:50