-1

How can I write a regex that allows a pattern to start with a specific character, but that character is optional?

For example, I would like to match all instances of the word "hello" where "hello" is either at the very start of the line or preceded by an "!", in which case it does not have to be at the start of the line. So the first three options here should match, but not the last:

hello
!hello
some other text !hello more text
ahello

I'm specfically interested in JavaScript.

maxedison
  • 17,243
  • 14
  • 67
  • 114

2 Answers2

2

Match it with: /^hello|!hello/g

The ^ will only grab the word "hello" if it's at the beginning of a line.

The | works as an OR.

var str = "hello\n!hello\n\nsome other text !hello more text\nahello";

var regex = /^hello|!hello/g;

console.log( str.match(regex) );

Edit:

If you're trying to match the whole line beginning with "hello" or containing "!hello" as suggested in the comment below, then use the following regex:

/^.*(^hello|!hello).*$/gm

var str = "hello\n!hello\n\nsome other text !hello more text\nahello";

var regex = /^.*(^hello|!hello).*$/gm;

console.log(str.match(regex));
thingEvery
  • 3,368
  • 1
  • 19
  • 25
  • You answer is incorrect because it didn't match this string **some other text hello more text** – Patrissol Kenfack Mar 25 '20 at 03:37
  • @PatrissolKenfack OP didn't ask to match the whole line. The question was to **match all instances of the word "hello" where "hello" is either at the very start of the line or preceded by an "!"** – thingEvery Mar 25 '20 at 03:39
  • @thingEvery OP syas _character is optional_, I think OP wants to somehow match `!` if that was the first character at the start of the string before "hello". Like using recurse subpattern maybe, but there is none in JS. – kishkin Mar 25 '20 at 07:20
  • Yup, the first example is exactly what I was after. Surprised I didn't figure that out given how simple it was! – maxedison Mar 28 '20 at 00:50
1

Final solution (hopefully)

Looks like, catching the groups is only available in ECMAScript 2020. Link 1, Link 2.

As a workaround I've found the following solution:

const str = `hello
!hello
some other text !hello more text
ahello
this is a test hello !hello
JvdV is saying hello
helloing or helloed =).`;

function collectGroups(regExp, str) {
  const groups = [];
  str.replace(regExp, (fullMatch, group1, group2) => {
    groups.push(group1 || group2);
  });
  return groups;
}
const regex = /^(hello)|(?:!)(hello\b)/g;
const groups = collectGroups(regex, str)
console.log(groups)

/(?=!)?(\bhello\b)/g should do it. Playground.

Example:

const regexp = /(?=!)?(\bhello\b)/g;

const str = `
hello
!hello
some other text !hello more text
ahello
`;

const found = str.match(regexp)

console.log(found)

Explanation:

  • (?=!)?

    • (?=!) positive lookahead for !
    • ? ! is optional
  • (\bhello\b): capturing group

    • \b word boundary ensures that hello is not preceded or succeeded by a character

Note: If you also make sure, that hello should not be succeeded by !, then you could simply add a negative lookahead like so /(?=!)?(\bhello\b)(?!!)/g.


Update

Thanks to the hint of @JvdV in the comment, I've adapted the regex now, which should meet your requirements:

/(^hello\b)|(?:!)(hello\b)/gm

Playground: https://regex101.com/r/CXXPHK/4 (The explanation can be found on the page as well).


Update 2:

Looks like the non-capturing group (?:!) doesn't work well in JavaScript, i.e. I get a matching result like ["hello", "!hello", "!hello", "!hello"], where ! is also included. But who cares, here is a workaround:

const regex = /(^hello\b)|(?:!)(hello\b)/gm;
const found = (str.match(regex) || []).map(m => m.replace(/^!/, ''));
Kenan Güler
  • 1,868
  • 5
  • 16
  • 1
    This would still match `hello` even when it isn't at the string's starting position. Only when preceded by `!` it was allowed to be further on in the string. – JvdV Mar 25 '20 at 08:47
  • 1
    @JvdV Oh, thanks for your hint! You're right, I should have read the question more carefully... Updated my answer. – Kenan Güler Mar 25 '20 at 10:12
  • 1
    Almost there, now you still allow words at the start like: `helloing` or `helloed` =). Meaning, the point is: the specified string can still be a substring of a longer word. Btw, the other answer does so too still. – JvdV Mar 25 '20 at 10:32
  • 1
    @JvdV great catch! forgot a `\b` there and also a non-capturing group. Should be actually `(^hello\b)|(?:!)(hello\b)`. Updating my answer. Thank you for your feedback, appreciated! – Kenan Güler Mar 25 '20 at 10:38
  • 1
    A whole match is still group 0, so since your not specifically pulling capture group 1 or 2, you'll get a full match including the `!`. Your workaround is fine, but then there is no reason for a non-capture group nomore. Also see [this](https://stackoverflow.com/q/18178597/9758194). The best solution I can think of myself would be `(^|!)(hello\b)` and grab capture group 2 =). This would also circumvent your update2 solution. Sorry for so many comments, I only meant them as positive feedback =) – JvdV Mar 25 '20 at 11:01
  • @JvdV positive/negative feedbacks are always welcome! Good point. Looks like `str.match(regex)` delivers only the full matches, thus we see the results like `!hello`. I will try to find another solution. Btw. [here](https://repl.it/repls/CyanUntriedMegahertz) I created another example with python, and it works beautifuly. – Kenan Güler Mar 25 '20 at 14:09
  • About your suggestion: I don't know whether ur regex is a complete solution, but with that you'ld also catch groups where only `!` in it, which is kinda not nice. :D – Kenan Güler Mar 25 '20 at 14:12
  • 1
    @JvdV I kinda found a solution, I think. Updated my answer. I'm outta here, take care! – Kenan Güler Mar 25 '20 at 15:27