Is it possible to do flat lazy generation in python? For instance, what I'm trying to do in the following code is passing in the results of os.walk and attempting to return only those results
def os_walk_transcript(self, walk_results):
"""Takes the results of os.walk on the data directory and returns a list of absolute paths"""
file_check = lambda walk: len(walk[2]) > 0
srt_prelim = lambda walk: ".srt.sjson" in " ".join(walk[2])
relevant_results = (entry for entry in walk_results if file_check(entry) and srt_prelim(entry))
return (self.os_path_tuple_srts(result) for result in relevant_results)
def os_path_tuple_srts(self, os_walk_tuple):
srt_check = lambda file_name: file_name[-10:] == ".srt.sjson"
directory, subfolders, file_paths = os_walk_tuple
return [os.path.join(directory, file_path) for file_path in file_paths if srt_check(file_path)]
It's important that the results of os_walk_transcript are lazily evaluated, but I would like to be able to evaluated this in a flat manner as opposed to it's current nested-list evaluation.
For example: Currently when I ask for a result from the resulting generator I get a full list of ["1.srt.sjson", "2.srt.sjson", "3.srt.sjson"]
and then if I call it again I would get: ["4.srt.sjson"]
I'm working in a project where the data is large enough and inconsistent enough that this behaviour results in inconsistent performance and occasionally this will cause things to slow down more than I'd like. Is there any way to force the lazy evaluation to be even lazier and just load the objects up one at a time?