0

Is it possible to do flat lazy generation in python? For instance, what I'm trying to do in the following code is passing in the results of os.walk and attempting to return only those results

def os_walk_transcript(self, walk_results):
    """Takes the results of os.walk on the data directory and returns a list of absolute paths"""
    file_check = lambda walk: len(walk[2]) > 0
    srt_prelim = lambda walk: ".srt.sjson" in " ".join(walk[2])
    relevant_results = (entry for entry in walk_results if file_check(entry) and srt_prelim(entry))
    return (self.os_path_tuple_srts(result) for result in relevant_results)

def os_path_tuple_srts(self, os_walk_tuple):
    srt_check = lambda file_name: file_name[-10:] == ".srt.sjson"
    directory, subfolders, file_paths = os_walk_tuple
    return [os.path.join(directory, file_path) for file_path in file_paths if srt_check(file_path)]

It's important that the results of os_walk_transcript are lazily evaluated, but I would like to be able to evaluated this in a flat manner as opposed to it's current nested-list evaluation.

For example: Currently when I ask for a result from the resulting generator I get a full list of ["1.srt.sjson", "2.srt.sjson", "3.srt.sjson"] and then if I call it again I would get: ["4.srt.sjson"] I'm working in a project where the data is large enough and inconsistent enough that this behaviour results in inconsistent performance and occasionally this will cause things to slow down more than I'd like. Is there any way to force the lazy evaluation to be even lazier and just load the objects up one at a time?

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144

2 Answers2

1

You can use itertools chain.from_iterable(). Documentation is here.

Basically, you can use it like this:

import itertools

myList = [[1,2,3],[4,5],[6],[7,8,9]]

itr = itertools.chain.from_iterator(myList)

itr will now be a generator object which returns the next element each time you call it. (in this case, it'll be exactly like xrange(10))

James
  • 2,635
  • 5
  • 23
  • 30
-1

Couldn't you just make a function like this?

def lazyarray(index):
    return str(index) + ".srt.sjson"

then you could even go so far as to do this

firstTen = [lazyarray(x) for x in xrange(10)]

Fully lazy, and very simple in its implementation. If you want to get a little less lazy (cache calculations) you might be able to do this.

cache = []
def lazyarray(index):
    if len(cache) <= index:
        cache += ["" for x in xrange(index - len(cache))]
    if cache[index] == "":
        cache[index] = str(index) + ".srt.sjson"
    return cache[index]

I haven't tested any of this code so it might require tweaking, and I'm not dealing with files, but isn't this what you were asking about?

And no matter where you are in code, rather than saying

lazyarray[5]

just say

lazyarray(5)

and it will have the same effect as an array.

EDIT: you could even override the __getitem__ method, as shown here and just have a custom generator class based on the code I posted above.

Community
  • 1
  • 1
coder543
  • 803
  • 6
  • 16
  • This addresses the lazy part, but not the flat part, which is the part of this question that I actually need help with. If you noticed in the code above, I'm already making copious use of lazy evaluation, however what I need is to change a lazy evaluation of lists into a lazy evaluation of single elements. I don't mean to be rude, but it seems you didn't read by question or look at my code before posting. – Slater Victoroff Jun 12 '13 at 22:20
  • I'm sorry you feel that way. I provided one way of answering this post, even if I didn't realize it wasn't what you were looking for. I did read both your code and the question, and I did my best to interpret it correctly. – coder543 Jun 13 '13 at 00:02
  • To quote, __you asked "Is there any way to force it to be lazier and just load up the objects one at a time?"__ This is the question I sought to answer. What I provided was a framework for making something very lazy in execution. Iterating through a list is easy, the key here was making it a lazy evaluation of the list. Apparently, either I failed at interpreting the post, or you needed to clarify your question. I'm glad someone interpreted it correctly, though. – coder543 Jun 13 '13 at 00:04
  • The problem was not lazy evaluation of a list. The problem was specifically the lazy evaluation of a list that was already being lazily evaluated. Very key distinction. One is solved with parentheses, the other requires use of an external module. You response was lazy, yes, but not that lazy. I needed something that was recursively lazy, whereas if the elements in your lazy array were lists you would be running into the exact same question that I initially posed. – Slater Victoroff Jun 13 '13 at 00:18