0

I'm not an expert at regular expressions by any stretch of the imagination! I understand the basics of how regex comes together, but through a regular expression, can I search for two words that could appear anywhere within a piece of text?
i.e. the words hot and weather. Could be written as: the weather was hot during the hot weather the weather has become even hotter

Is it possible that a regex can be created to pick up all three scenarios, but not (for example) the picture was shot in poor weather?

Any help would be appreciated - I'm working in PHP5.6 by the way alternative is there a better way to do it that I haven't thought of?

modusCell
  • 13,151
  • 9
  • 53
  • 80
TIW
  • 217
  • 4
  • 10
  • What do you need? Just know if the words occur, both or just one, or how many times or... – RST Aug 16 '14 at 23:21
  • Hints: word boundaries as `\b`, but regexes don't know the english language, it would assume `hotter` is NOT `hot`. – Wrikken Aug 16 '14 at 23:24
  • Ideally to know if all words occur. Ideally the regex should be able to accommodate more than two words if necessary. – TIW Aug 16 '14 at 23:24
  • @Wrikken: Yes thats a valid point. I was thinking of adding a . character, but then that would also match hotel which wouldn't work. :( – TIW Aug 16 '14 at 23:33
  • Yep, parsing the _meaning_ of text is waaaaay out of scope here. You'll need whole dictionaries, semantic rules, syntactic rules, a team of star programmers, and a few years if that's what you want. Or live with the occasional over-matching (hotel, shot) or undermatching (not matching hotter), whichever of those 2 suits your case the most. – Wrikken Aug 16 '14 at 23:36
  • @Wrikken: Yes agreed! Definitely go with the under matching! – TIW Aug 16 '14 at 23:39
  • You could expand it out to something like this: `(\bweather\b.*hot(ter|test)?\b|\bhot(ter|test)?\b.*weather\b)` – Kyle James Walker Aug 16 '14 at 23:43
  • @KyleWalker: indeed, I assumed the words would be supplied by an end-user which does not know regexes. If that assumption is wrong and these words can be entered by regex-savvy developers, then that would work indeed. Don't forget the `\b` before the first 'hot' & last 'weather' though ;). – Wrikken Aug 16 '14 at 23:46
  • In that case you could have the user enter two words and simply append some common endings to words to help it catch a bit more, narrowing down some false positives, but also adding some false negatives. Oh correct need 2 more `\b`'s – Kyle James Walker Aug 16 '14 at 23:48
  • @KyleWalker: Thanks for your comments, could you expand out on what you mean with that - possibly as a full answer? – TIW Aug 16 '14 at 23:52

1 Answers1

1

If you just need those two words you could have the regex search with an optional list of word endings.

For Example: (\bweather\b.*hot(ter|test)?\b|\bhot(ter|test)?\b.*\bweather\b)

But if you need to build the regex from a user's input you would want to have a full list of possible endings: (s|er|est|ier|iest|ter|test|etc|etc)?

Example: (\bweather(s|er|est|ier|iest|ter|test|etc|etc)?\b.*hot(s|er|est|ier|iest|ter|test|etc|etc)?\b|\b(s|er|est|ier|iest|ter|test|etc|etc)?\b.*\bweather(s|er|est|ier|iest|ter|test|etc|etc)?\b)

The only problem is it would miss endings like silly being silliest or see being saw without adding additional logic to look at the original words.

Kyle James Walker
  • 1,238
  • 14
  • 16