0

I want to test that a string contains the word ok.

What I initially tried to do was write this regex:

ok[^a-z]

This means the next thing after my desired ok can be some whitespace or punctuation, but not an alphanumeric character, and would correctly distinguish between just ok and okay, catching only the ok.

However, the problem happens if my entire input string ends right after the ok. The [^a-z] mandates that another character be present to be satisfied. I don't know how to write a condition that says no more characters is also acceptable.

Desired results:

  • ok - satisfied
  • ...ok... - satisfied
  • ok hello world - satisfied
  • hello world ok - satisfied
  • okay - not satisfied
  • o k - not satisfied

This is to be used in a Bash script, which according to this page, uses the ERE regex dialect.


Intended usage example:

if [[ "$@" =~ ok[^a-z] ]]; then
    echo "Argument contains an \"ok\""
fi
Digital Ninja
  • 3,415
  • 5
  • 26
  • 51
  • I don't see anything that indicates "numeric" in your pattern. Anyway, it looks like you're looking for a [word boundary](https://www.regular-expressions.info/wordboundaries.html) (e.g., `ok\b` or `\bok\b`)? This will match "ok" only when not followed by an alphanumeric or underscore character. Alternatively, you could use a negative Lookahead: `ok(?![a-z0-9])` if you want to be more specific. – 41686d6564 stands w. Palestine Sep 27 '22 at 02:55
  • Neither of those worked for me. I'm not sure if I should have also noted how I tried to use them in the script and what my options there are. In my attempts, if i now pass `ok` as an argument to my script, and then do `if [[ "$1" =~ ok\b ]]; then...`, as I understand it, the condition should be satisfied? – Digital Ninja Sep 27 '22 at 03:08
  • 1
    **Update:** It works after storing the `ok\b` into a variable and then using the variable in the condition. I don't quite understand why it's necessary in this case, but a hint towards that found [here](https://stackoverflow.com/a/19146454/3788043). Thanks for pointing me in the right direction! – Digital Ninja Sep 27 '22 at 03:44

1 Answers1

0
  • \w in regex represents all letters, numbers, and underscores
  • \W represents the opposite: it does not match letter, numbers and underscores
  • $ indicates the end of a line
  • (a|b) represents a or b
  • (?=) is a positive lookahead, this means only match if what follows matches too (but don't actually capture what we lookahead for)

We can combine all of these to make one singular regex to match no alphanumerics or the end of a line

ok(?=\W|$) - match ok only if when we lookahead, we find a non-alphanum or the end of a line

  • Is this meant to work in Bash? Because it doesn't seem to work for me. I've edited my original question to add a usage example, perhaps it doesn't work with simple conditions and is meant to be invoked in some other way? – Digital Ninja Sep 27 '22 at 03:33
  • it should work https://regex101.com/r/cqa4ph/1 – Nicholas Hansen-Feruch Sep 27 '22 at 03:35
  • That isn't the same regex engine, and as [this answer](https://stackoverflow.com/a/65856276/3788043) hints, lookaheads and behinds are indeed not supported in Bash. But I've managed to solve it with some hints from the other comments. Thanks for trying to help! – Digital Ninja Sep 27 '22 at 03:49