I'm developing an application where users can use regular expressions to search and extract information from text. One user inserted the following regexp: ([^\s]*( ?)){3}\n
(they added a blank line at the end of the expression). This can be translated to Python:
import re
regexp = re.compile(r'([^\s]*( ?)){3}\n')
The regexp is used to extract an URL from a set of URLs separated by spaces. When you run the following command, the time spent evaluating the search is about half a second on my machine.
regexp.search(u'https://www.example.com/data/products/64973s3d.jpg https://www.example.com/data/products/j4e0ls.jpg https://www.example.com/data/products/sfd69es6d9.jpg https://www.example.com/data/products/6sd9fw.jpg https://www.example.com/data/products/64973s3d.jpg https://www.example.com/data/products/j4e0ls.jpg https://www.example.com/data/products/64973s3d.jpg https://www.example.com/data/products/j4e0ls.jpg https://www.example.com/data/products/64973s3d.jpg https://www.example.com/data/products/j4e0ls.jpg https://www.example.com/data/products/64973s3d.jpg https://www.example.com/data/products/j4e0ls.jpg')
When you remove the new line at end of the regexp (([^\s]*( ?)){3}
), the search is much faster.
Why does this happen? And is it possible to prevent this behavior so that users cannot slow down the whole application by using similar (slow) regular expressions?