I'm just running into a weird thing. I'm prototyping text-crawling using the Open ANC as corpora.
There are some texts where the re
module is just not responding. If someone can affirm that this about the RegEx complexity the re module can handle I'm fine.
The RegEx is preceding(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+acquired
The text the problem occures is:
My claim is that Lincoln’s address expresses the same idea that was then current in Europe. Each people of common history and language constitutes a nation, and the natural form for the nation’s survival was in a state structure. The idea that Americans constituted an organic national unit explained, implicitly, why the eleven Southern states could not go their own way. As he assumed the presidency, Lincoln still spoke of the Union rather than a nation; but in the course of the debates in the decades immediately preceding, the notion of union had acquired the metaphysical qualities of nationhood. In his first inaugural address, Lincoln invoked the “bonds of affection,” and even before shots were fired on Fort Sumter in Charleston Harbor, he stressed the unbreakable ties of historical struggle:
python code to produce problem:
import re
txt = "post text here"
regex = r"preceding(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+acquired"
re.findall(regex, txt)