How to perform multiple regex searches on file?

Question

I'm reading through an xml file to find four bits of data that I'm then updating in a database. My question is how to more efficiently search for them. Is line by line the best in this case? I'm currently reading the whole file into a variable and then doing a search on it.

with open(cur_file, 'rb') as xml_file:
    bill_mgr_email_re = re.compile(r'<BillingManagerInformation .* Email="(.*.com)"')
    num_bills_re = re.compile(r'NumberBills="(\d+)"')
    num_ebills_re = re.compile(r'NumberOfEbills="(\d+)"')
    num_mailed_re = re.compile(r'NumberOfMailedDocs="(\d+)"')
    data = xml_file.read()

    bill_mgr_email = bill_mgr_email_re.search(data).group(1)
    num_bills = num_bills_re.search(data).group(1)
    num_ebills = num_ebills_re.search(data).group(1)
    num_mailed = num_mailed_re.search(data).group(1)

Iterating over each line in a file is generally preferred over reading the whole thing into memory at once, mostly because you can then process files that are larger than your computer's available memory without having a memory problem. If the file is small, though, it really doesn't matter. — TigerhawkT3, Sep 28 '15 at 23:54
Using something like [`xml.etree.ElementTree.iterparse`](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse) should be similarly efficient to using regular expressions (and as a side-benefit, not pull the whole file into memory, and only scan it once). Plus it's actually likely to be correct (rather than confusing attributes for text and what have you). [Never use regular expressions to parse XML/HTML/whatever](https://stackoverflow.com/a/1732454/364696). — ShadowRanger, Sep 29 '15 at 00:04
If you can make really strong assumptions about your data, then regexes can work for your specific use case. If you're unsure about the format of the data, then using regexes could easily mean that you miss elements whose format violates your assumptions. In that case, use a SAX parser instead. — beerbajay, Sep 29 '15 at 00:10

How to perform multiple regex searches on file?

0 Answers0