I have a bit of a background in Formal Languages and recently I found out that Java and other languages use what is coined Extended Regular Languages. Because of my background, I had always assumed in languages like Java when I call compile for Pattern it produced a DFA or Transducer in the background. As a result, I had always assumed no matter how ugly my regex, no matter how long my regex, Pattern.matches or similar methods would run in linear time. But that assumption seems to be incorrect.
A post I read seems to suggest some Regex expressions do run in linear time, but I don't entirely believe or trust one single person.
I'll eventually write my own Java Formal Regular Expression library (existing ones I've found only have GNU GPL licences) but in the meantime I have a few questions on the time complexity of Java/C# regexs. Want to ensure what I read elsewhere is correct.
Questions:
- A formal language like \sRT\s., would the match method of a regexes in Java/C# solve membership in linear or non-linear time?
- In general, how would I know if a given Regular Language's expression membership problem is linear time for a Regex?
I do text analysis, finding out Java regexes aren't DFA's was really a downer.