0

Regexp match all pieces of a string with [0-9_]+ and skip optional _[a-z0-9]{24} ?

For instance,

hello word some_stuff other_stuff_607eea770b6d00003d001579 something

Should only capture/match

hello word some_stuff other_stuff something

Here's what I have but it still matches some part of [a-z0-9]{24}

/[a-z]+(_[a-z]+)?(?:[a-z0-9]{24})?/
Eric
  • 9,870
  • 14
  • 66
  • 102

2 Answers2

1

You're looking to match strings consisting of letters and underscores, whole words, with the end of the word at the end of the string, or a sequence of 24 more letters and/or numbers preceded by an underscore:

\b[a-z_]+(?=_[0-9a-z]{24}|\b)
Grismar
  • 27,561
  • 4
  • 31
  • 54
1

As you mention php in the comment of the accepted answer, you might also make use of a SKIP FAIL approach:

_[0-9a-z]{24}(*SKIP)(*FAIL)|[a-z]+(?:_[a-z]+)*

In parts, the pattern matches:

  • _[0-9a-z]{24} Match _ and 24 repetitions of ranges 0-9a-z
  • (*SKIP)(*FAIL) The previous matched should not be part of the match result
  • | or
  • [a-z]+ Match 1+ chars a-z
  • (?:_[a-z]+)* Optionally repeat _ and 1+ chars a-z

See a regex demo and a PHP demo

Example code

$re = '/_[0-9a-z]{24}(*SKIP)(*FAIL)|[a-z]+(?:_[a-z]+)*/';
$str = 'hello word some_stuff other_stuff_607eea770b6d00003d001579 something';

preg_match_all($re, $str, $matches);

var_export($matches[0]);

Output

array (
  0 => 'hello',
  1 => 'word',
  2 => 'some_stuff',
  3 => 'other_stuff',
  4 => 'something',
)
The fourth bird
  • 154,723
  • 16
  • 55
  • 70