I've switched to Python pretty recently and I'm interested to clean up a very big number of web pages (around 12k) (but can be considered just as easily text files) by removing some particular tags or some other string patterns. For this I'm using the re.sub(..) function in Python.
My question is if it's better (from the efficiency point of view) to create one big regular expression that matches more of my patterns or call the function several times with smaller and simpler regular expressions.
To exemplify, is it better to use something like
re.sub(r"<[^<>]*>", content)
re.sub(r"some_other_pattern", content)
or
re.sub(r"<[^<>]*>|some_other_pattern",content)
Of course, for the sake of exemplifying the previous patterns are really simple and I haven't compiled them here, but in my real-life scenario I will.
LE: The question is not related to the HTML nature of the files, but to the behavior of Python when dealing with multiple regex patterns.
Thanks!