Python newbie here. I want to walk through a large mbox file, parsing email messages. I can do that with:
import sys
import mailbox
def gen_summary(filename):
mbox = mailbox.mbox(filename)
for message in mbox:
subj = message['subject']
print subj
if __name__ == "__main__":
if len(sys.argv) != 2:
print 'Usage: python genarchivesum.py mbox'
sys.exit(1)
gen_summary(sys.argv[1])
But I need more control. I need to be able to get the byte position of the start of a given email in the mbox file and I also need to get the number of bytes in the message (as represented on disk). And then in the future, instead of iterating from the beginning of the mbox file, I need to be able to seek to a given message and just parse that (hence one of the needs of getting the byte position on disk). These are large mbox files and efficiency is a concern.
The purpose of all this is so that I can generate a summary file, which contains some small bits about each email in the mbox, and then in the future efficiently look up individual emails within the mbox.