3

I have a very long question with what may be a very short answer.

I'm very new to Python (almost two weeks, now) but I've used VBScript for many years so I understand many of the basic concepts.

I've searched Stack Overflow and the internet for a solution but haven't been able to find anything; I'm not sure if this is even possible in Python, but I'd be a bit surprised if it's not. I've written a file-search program using Python3 that allows users to search their computer for files. The user can choose to search based on several different parameters: name, size range, date-modified range and, for non-Linux systems, date-created range. The search functionaly works quite well for each individual parameter as well as for combinations of parameters (which, by the way, is in no small part thanks to the many answers / discussions I've found here on Stack Overflow). My problem is that the actual search is rather inelegant and, I believe, slower than it could be. The program uses flags (NOT true Python flags, that's just what I happen to call them) to set the search options. Let me illustrate with some pseudo code:

    # Variables get their values from user entry
    sName = "test" # string to search for
    sMinSize = 2 # minimum search-by size in MB
    sMaxSize = 15 # maximum search-by size in MB
    sModded1 = 2008-01-23 # earliest modified-by date
    sModded2 = 2017-08-22 # latest modified-by date

    sCreated1 = 2008-01-23 # earliest created-by date
    sCreated2 = 2017-08-22 # latest created-by date

    # Search parameters - choosing one of these changes the value from 0 to 1:
    flagName = 0 # search by name
    flagSize = 0 # search by size
    flagModified = 0 # search by last modified date
    flagCreated = 0 # search by last created date

    for root, dirs, files in os.walk(strPath, followlinks=False):
        for fName in files:
            fileDate = os.path.getmtime(fName)
            fileSize = os.stat(fName).st_size
            if flagName = 1:
                    if fName.find(sName) > 0:
                        do_stuff
            elif flagSize = 1:        
                    if sMinSize < fileSize < sMaxSize:
                        do_stuff
            elif flagName = 1 and flagSize = 1:
                    if fName.find(sName) > 0 and if sMinSize < fileSize < sMaxSize:
                        do_stuff
    ... etc

That's only for 3 possible combinations - there are 14 total. While I don't really have a problem with typing all the combinations out, I believe this would severely impact the speed and efficiency of the search.

I've thought of another solution that is a bit more elegant and would probably execute faster, but I still think there's a better method:

    if flagName = 1:
        for root, dirs, files in os.walk(strPath, followlinks=False):
            for fName in files:
                fileDate = os.path.getmtime(fName)
                fileSize = os.stat(fName).st_size
                if fName.find(sName) > 0:
                    do_stuff

    elif flagName = 1 and flagSize = 1:
        for root, dirs, files in os.walk(strPath, followlinks=False):
            for fName in files:
                fileDate = os.path.getmtime(fName)
                fileSize = os.stat(fName).st_size
                if fName.find(sName) > 0 and if sMinSize < fileSize < sMaxSize:
                    do_stuff
    ... etc

Again, this is a bit more elegant and (I believe) a great deal more efficient, but still not ideal. What I'd like to do is create ONE "if" statement based on the user's search criteria and use that to conduct the search (note that something similar is possible in VBScript). These statements would go BEFORE the search statements take place:

Possible option 1:

    if flagName = 1:
        iClause = "fName.find(sName) > 0"
    elif flagName = 1 and flagSize = 1:
        iClause = "fName.find(sName) > 0 and if sMinSize < fileSize < sMaxSize"
    ... etc

Possible option 2:

    flagClause = 0
    if flagName = 1:
        iClause = "fName.find(sName) > 0"
        flagClause = flagClause + 1
    if flagClause = 0
        iClause = "sMinSize < fileSize < sMaxSize"
    else:
        iClause = iClause + "and sMinSize < fileSize < sMaxSize"
        flagClause = flagClause + 1
    ... etc

And then plug "iClause" in to my search statement like so:

    for root, dirs, files in os.walk(strPath, followlinks=False):
        for fName in files:
            fileDate = os.path.getmtime(fName)
            fileSize = os.stat(fName).st_size
            if **iClause**:
                do_stuff

This would streamline the code, making it easier to read and maintain and (I believe) make it more efficient and speedy.

Is this possible with Python?


Edit:

I thank all of you for taking the time to read my lengthy question, but I don't believe you got what I was asking - most likely due to its (over)verbosity.

I would like to know how to implement the following:

    a = "sMinSize < fileSize < sMaxSize"
    b = "and sMinSize < fileSize < sMaxSize"
    iClause = a+b

Then plug 'iClause' into my "if" statement as follows:

    if iClause:
        do_stuff

This would basically be turning a string literal into a variable, then using that variablized (probably not a real word) string literal as my statement. I hope that was clearer.

3 Answers3

3

Create a predicate function, one for each case. Determine what cases you're using and use the associated predicate. Collect chosen predicates in a list (or compose them into a new predicate) then apply in your loop:

predicates = []
if flagName:
    predicates.append(lambda fileName: fileName.find(sName) > 0)
if flagSize:
    predicates.append(lambda fileName: sMinSize < os.stat(fileName).st_size < sMaxSize)
if flagModified:
    predicates.append(lambda fileName: sModded1 < os.path.getmtime(fileName) < sModded2)
if flagCreated:
    predicates.append(lambda fileName: sCreated1 < os.path.getctime(fileName) < sCreated2)

for root, dirs, files in os.walk(strPath, followlinks=False):
    for fName in files:
        if all(p(fName) for p in predicates):
            # do stuff

You may want to use a named function instead of a lambda depending on your preferences. And for more complex scenarios, you may want to implement these as functors instead.

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Thanks Jeff, but I want the program to collect and format all the search criteria **before** the "for" loop begins. Your suggestion collects it during the loop, which is way too costly in terms of speed. ... unless I'm reading it wrong. – user7207540 Aug 25 '17 at 13:16
  • Premature optimisation is the root of all evil. And `eval`, might be slower than you think (besides also being evil). I'd use this solution (or mine, which is pretty much equivalent but Jeff's is more elegant) over `eval` any day. – Amadan Aug 27 '17 at 03:29
  • @user7207540: the criteria _is_ collected before the loop. They're all in the list. None of the criterion that was not flagged will not be in that list. There is no recalculation at all. If you want "fast," then build predicates for every single combination of flags that you have and use that. That will not be a very maintainable solution. And you should drop the _need_ to use `eval`, it is never the right choice for problems like this... If you don't like the list of predicates, compose them into a single function... like I mentioned. You will not get far if you rely on `eval`. – Jeff Mercado Aug 27 '17 at 08:06
  • @user7207540: there's a reason none of the answers you got to your question _didn't_ involve using eval... any other solution is a better solution than that. – Jeff Mercado Aug 27 '17 at 08:08
  • I've tried to implement Jeff's solution but I'm having an issue - if it only uses the first search criteria then everything works well; if I attempt any of the other options by themselves they return zero files – user7207540 Aug 29 '17 at 23:59
  • Regarding the eval function (which I won't be implementing), it is a good deal slower than Jeff's method - it took about 6 times as long to search my root drive than using the predicate function. I wouldn't say it's "evil" - if it were then it would not be included in Python and would have been deprecated long ago. I'm sure it has its place in the Python pantheon of programming, just not in my code ;) – user7207540 Aug 30 '17 at 00:11
  • Previous comment got truncated ... First off - thank you all again for your time and advice. I've tried to implement Jeff's solution but I'm having an issue - if it only uses the first search option (search by name) everything works; if I add any additional option it returns zero files; if I use any of the other options by themselves they return a list of every file residing in the directory from which the file is run. This is probably due to my implementation, not a fault in the method itself. I'm testing with hard-coded data, not variables. I need to do some more research and testing. – user7207540 Aug 30 '17 at 00:15
  • As I'd suspected, the error was in my implementation. Once I gotten it fixed everything works beautifully. Thanks very much to everyone for your input and advice, and especially to Jeff for showing me how to use predicate functions! – user7207540 Aug 31 '17 at 04:36
1

Here's a bit of lateral thinking. If you look for things that match criteria, they must all match. But if you look for things that don't match, just one needs to be wrong to disqualify. So you don't need to write complex queries; just checking one option at a time is good enough. And you can do it in a loop!

# supplied by user (you might want to look into argparse)
options = {
    "name": "jpg"
    "minsize": "1024"
}

# checking code
option_checkers: {
    "name": lambda fName, limit: fName.find(limit) != -1
    "minsize": lambda fName, limit: limit <= os.stat(fName).st_size
    "maxsize": lambda fName, limit: os.stat(fName).st_size < limit
}

def okay(fName, options):
    for option, limit in options.items():
        if not option_checkers[option](fName, limit)
             return False
    return True

for root, dirs, files in os.walk(strPath, followlinks=False):
    for fName in files:
        if okay(fName, options):
            # fits all criteria: do stuff
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • That doesn't *appear* to address my problem. My problem isn't in telling the program what options the user chose, I'm trying to turn one or more string literals (a = "sMinSize < fileSize < sMaxSize", b = "and sMinSize < fileSize < sMaxSize") into a variable (iClause) that I can then use directly in my "if" statement. The actual "if" statement would be --> "if iClause:" – user7207540 Aug 25 '17 at 12:55
  • `eval`uating string literals is actually slower than this. See [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice). – Amadan Aug 27 '17 at 03:31
-1

I've found the solution - the "eval()" function. This will do exactly what I've been looking for. With this I can define my search fragments and write several short "if" statements that look like this:

flagClause = 0
if flagName = 1:
    iClause = "fName.find(sName) > 0"
    flagClause = flagClause + 1
if flagClause = 0
    iClause = "sMinSize < fileSize < sMaxSize"
else:
    iClause = iClause + "and sMinSize < fileSize < sMaxSize"
    flagClause = flagClause + 1
... etc

Now that my search string has been put together, I plug it into my "for" loop:

for root, dirs, files in os.walk(strPath, followlinks=False):
    for fName in files:
        fileDate = os.path.getmtime(fName)
        fileSize = os.stat(fName).st_size
        if eval(iClause):
            do_stuff

With my search string created before the "for" loop begins, it doesn't have to leave the loop to check for each condition. This should be a relatively efficient search.

Now ... does anybody see anything wrong with this solution?

Final Edit:

Per the advice (and admonition) I've received, the "eval" function was not my solution. Instead, I used the method suggested by Jeff. His solution is faster, more efficient and easier to maintain.

Thanks again for everyone's input and advice!

  • Whenever you think you need `eval()`: ***stop***. You don't need `eval()`. Put those snippets into functions (lambdas would do), and put those into a dictionary or list to make it easy to dynamically pick one. `somedictionary[somekey]` mapping to a function can still be called, at which point you can pass in the context you need. – Martijn Pieters Sep 22 '17 at 07:04
  • In other words, exactly what Amadan proposes in his answer. – Martijn Pieters Sep 22 '17 at 07:05
  • Goe gaad het, Martijn. Amadan's answer was very good, Jeff Mercado's was better as it did the job and returned results faster. Dankje wel. – user7207540 Sep 22 '17 at 13:22