0

I have this function in my code:

def load_fasta(filename):
    f = open(filename)
    return (seq.group(0) for seq in re.finditer(r">[^>]*", f.read()))

This will leave the file open indefinitely, which isn't good practice. How do I close the file when the generator is exhausted? I guess I could expand the generator expression into a for loop with yield statements and then close the file afterwards. I'm trying to use functional programming as often as possible, though (just as a learning exercise). Is there a different way to do this?

Colin
  • 10,447
  • 11
  • 46
  • 54

1 Answers1

1

Use yield instead of a single generator expression.

def load_fasta(filename):
    with open(filename) as f:
        for seq in re.finditer(r">[^>]*", f.read()):
            yield seq.group(0)

for thing in load_fasta(filename):
    ...

The with statement will close the file once the for loop completes. Note that since you read the entire file into memory anyway, you could simply use

def load_fasta(filename):
    with open(filename) as f:
        data = f.read()
    for seq in re.finditer(r">[^>]*", data):
        yield seq.group(0)
chepner
  • 497,756
  • 71
  • 530
  • 681
  • I think you should use `f.read()` or loop over the lines then pas them to `finditer()`, which if you use `f.read()` actually you are loading the whole file in memory so the generator would be useless – Mazdak Oct 03 '16 at 18:43
  • Yeah, I wrote this up hoping that `finditer` could use an iterator itself instead of a string. It can't, so this isn't really any better than just returning a list of matches. – chepner Oct 03 '16 at 18:43
  • Is there a way to do this without loading everything into memory? – Colin Oct 03 '16 at 19:03
  • Not that I am aware of. The problem is that, in general, the regex engine may need to do backtracking, which means if it were reading directly from a generator, it might need to go backwards. Someone may have written some sort of `onlinefinditer` that does its own buffering to *minimize* the amount of data kept in memory, but there isn't anything like that in the standard library. – chepner Oct 03 '16 at 19:07
  • Is there a way to do it without regex? Since all I really need to do is split the file at '>' characters. – Colin Oct 03 '16 at 19:11
  • As far as I know, you would need to do your own buffering. (Reading a fixed-size chunk of bytes into a buffer and attempting to read a record from that.) http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python might have some useful tips. – chepner Oct 03 '16 at 19:39