0

I have following words:

  1. is\s?(this|that|it)\s?true\s[?]?
  2. ^real$
  3. ^reall[y]*[\s]?[?]*$
  4. wh[a]*[t]*[?!][?]*

For every string, I have to search if any of these words are present in the string.

Whats the best way to do it?

I have tried using:

re.search(
    'is\s?(this|that|it)\s?true\s[?]?|^real$|^reall[y]*[\s]?[?]*$|wh[a]*[t]*[?!][?]*',
    string)

But it is very slow. Is there a better way to do this?

martineau
  • 119,623
  • 25
  • 170
  • 301
Jayanth
  • 329
  • 2
  • 5
  • 17
  • is there always a space between every word in the string you're talking about –  Apr 30 '17 at 20:44
  • Everything matching the second one will also match the third one. From what I see, this shouldn't be too slow - how slow is it? How fast do you need it? – tiwo Apr 30 '17 at 21:00
  • How large is your string? And _how_ slow is 'slow'? You can't really do any better than linear time anyway, so you can always try just searching for each string sequentially. (The more `|` operators you apply, the slower regular expressions become). – Akshat Mahajan Apr 30 '17 at 21:01
  • What do you think `[y]` is doing that is different from just `y`? – Bryan Oakley Apr 30 '17 at 21:30
  • The fourth one will match `wht`. Is that intentional? – Bryan Oakley Apr 30 '17 at 21:30
  • @BryanOakley yes, thats intentional – Jayanth Apr 30 '17 at 21:33
  • @AbdulrahmanAttia the space is optional – Jayanth Apr 30 '17 at 21:34
  • @BryanOakley my mistake. [y]* is not needed. It can be replaced with y* – Jayanth Apr 30 '17 at 21:36
  • You want to test every string for matches with up to 4 regular expressions. There's really no faster way to do that than what you're doing. Consider trying to trivially reject (or accept) some of them to reduce the number that need to go through the more rigorous pattern matching process. – martineau Apr 30 '17 at 21:37
  • @AkshatMahajan I am trying to find these words in stream of tweet texts. So i want to know if there is a faster method than what i am using – Jayanth Apr 30 '17 at 21:42

1 Answers1

-1

If you're using the same regular expression on a lot of strings, you can try using re.compile to save some time.

tiwo
  • 3,238
  • 1
  • 20
  • 33
  • see [Is it worth using Pythons `re.compile`?](http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile) – tiwo Apr 30 '17 at 21:02
  • 1
    Doubt compiling the regex will do much good since the library already caches compiled versions of them internally. – martineau Apr 30 '17 at 21:28