0

I would like to check if a string contains any word other than some predefined ones. The predefined words are What is,plus,minus,multiplied by,divided by, single whitespace included in some of the phrases. I've read this post and this one, both using negative lookaheads, but couldn't come up with a pattern that worked.

For example, input text "What is plus abc divided by" should come back as "abc" not recognized.

What would be a correct regex for this?

Edit:

Note that I don't care about what the invalid token is, just that it exists. It can be anything, a word or a number. The question can also be thought as "check if the input contains only allowed words".

Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219

3 Answers3

1

Simply join them up in a group:

(?:What is|plus|minus|multiplied by|divided by)

Note that if you have, for example, multiply and multiply by (i.e. one token that starts with another), multiply by must comes first:

(?:What is|plus|minus|multiply by|multiply)

To check if the string only contains valid tokens, use:

^                  # Match at the start of string
\g<token>          # a pre-defined token
(?:\s+\g<token>)*  # followed by 0 or more tokens
$                  # right before the end of string.

...where \g<token> denotes the expression above.

Try it on regex101.com.

Original answer

Since we also need to find the (first) invalid token, you need to match every non-whitespace streaks and store those which are not matched by the expression above in a group:

(?:What is|plus|minus|multiplied by|divided by)|(\S+)

If the match contains group 1, that means it is a non-recognized token. Output an error accordingly.

Try it on regex101.com.

InSync
  • 4,851
  • 4
  • 8
  • 30
  • I don't need to capture the invalid token, just need to know if it exists. I've edited my question to clarify that. – Abhijit Sarkar Jun 15 '23 at 00:17
  • This doesn't answer the question because it matches almost anything. – Bohemian Jun 15 '23 at 00:48
  • @Bohemian How does it match "almost anything", given that valid tokens are pre-defined? – InSync Jun 15 '23 at 00:49
  • What *doesn't* it match is the question. See [live demo](https://rubular.com/r/p5k3WZPS2A9Xpc) and add some more examples yourself to see what I mean. – Bohemian Jun 15 '23 at 00:52
  • @Bohemian That regex is in the original answer which I have noted as such (at first, I thought OP needs a nice error message rather than just a boolean). Have you tried the new one then? – InSync Jun 15 '23 at 00:53
  • `^(?:\g\s*)+$` seems to be simpler and does the job using your examples. Am I missing something? – Abhijit Sarkar Jun 15 '23 at 01:01
  • @AbhijitSarkar If you want `What isplus` to match, then yes. You can shorten the regex to [`^(?:\g(?:\s+|$))+$`](https://regex101.com/r/QQeVJk/4), but I think that wouldn't be as readable. – InSync Jun 15 '23 at 01:31
  • @InSync "What isplus" isn't valid. So, sounds like, I got my solution. On hindsight, it's really simple, not sure why I was confused with lookaheads. I'm keeping your answer as accepted as I got my idea from it. – Abhijit Sarkar Jun 15 '23 at 01:50
  • I know that isn't valid, I'm presenting it as a counter-example since `^(?:\g\s*)+$` matches it. – InSync Jun 15 '23 at 01:51
0

"... check if the input contains only allowed words".

You would have to then check the result to see if the non-specified value is allowed.

What is +(.+?) +(?:plus|minus) +(.+?) +(?:(?:multiplied|divided) by) +(.+)

Alternately, specify the values.  In this case it's most likely numbers only.

What is +(\d+) +(?:plus|minus) +(\d+) +(?:(?:multiplied|divided) by) +(\d+)

Example

What is 1 plus 2 divided by 3

The output would be 1, 2, and 3.

And, ultimately allow for fractional values.

What is +(\d+(?:\.\d+)?) +(?:plus|minus) +(\d+(?:\.\d+)?) +(?:(?:multiplied|divided) by) +(\d+(?:\.\d+)?)
What is 1.23 plus 2.3 divided by 3
Reilas
  • 3,297
  • 2
  • 4
  • 17
-1

Use a negative look ahead to try to match the whole input being not being made of just the allowed phrases:

^(?!((^| )(What is|plus|minus|multiplied by|divided by)( |$))+$).*

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Fails for "What is plus divided by abc". I found something that passes my tests, mentioned in my comment to @InSync's answer. `^(?:\g\s*)+$` – Abhijit Sarkar Jun 15 '23 at 01:05
  • @AbhijitSarkar oops - left out the `$` at the end of the look ahead. Regex and demo link updated. – Bohemian Jun 15 '23 at 01:47