1

I have a string

CO12dadaCO2dafdCO345daaf

I want to extract all occurences of CO followed by some digits /CO(\d*)([\s\S]*)/, up to another CO.

In this case I want to get the output:

['CO12dada', 'CO2dafd', 'CO345daaf']

The above regex I tried also matches the rest of the CO's at once so it doesn't work.

I could get the index of a regex for the first match using str.search, but I need the indexes of a regex for all occurrences.

eguneys
  • 6,028
  • 7
  • 31
  • 63

4 Answers4

1

const string = 'CO12dadaCO2dafdCO345daaf'
const result = string.match(/(CO.*?)(?=CO|$)/g)
console.log(result)
stranded
  • 312
  • 2
  • 9
1

Just get your matches with .split():

console.log("CO12dadaCO2dafdCO345daaf".split(/(?!^)(?=CO)/))

Result:

[
  "CO12dada",
  "CO2dafd",
  "CO345daaf"
]

(?!^)(?=CO) = matches the empty string before CO substring, but not at the string start.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • Cool I didn't know split took regex like that. Is that a positive lookahead, what does (?!^) do? – eguneys Sep 14 '20 at 20:20
  • @eguneys `(?!^)` is a negative lookahead, please see [this question about it](https://stackoverflow.com/questions/15669557/regex-match-pattern-as-long-as-its-not-in-the-beginning) – Ryszard Czech Sep 14 '20 at 20:22
  • So split is equivalent of a `match` with a global regex? – eguneys Sep 14 '20 at 20:32
  • This is confusing, because regex doesn't match the input, but makes marks somehow (what is that mean), and in case of split those marks are split. Can you please clarify your answer. – eguneys Sep 14 '20 at 20:39
  • @eguneys The `g` flag is redundant in `split`, it is the default behavior to look for all matches and split with them in `split`. The regex **matches** the input well, just it matches empty strings, and divides the string right at those positions. – Ryszard Czech Sep 14 '20 at 20:40
  • I don't quite understand how a regex can match empty string at certain positions. And why would I think of making a regex like that for any use case other than with `split`. – eguneys Sep 14 '20 at 20:51
  • 1
    @eguneys This is out of scope, but you may want to match empty locations in a string to insert something there. `"a1b2".replace(/(?=\d)/g, '-')` returns `a-1b-2`. A must-read for you is ["Lookahead and Lookbehind Zero-Length Assertions"](https://www.regular-expressions.info/lookaround.html). Also, see [Mastering Lookahead and Lookbehind](https://www.rexegg.com/regex-lookarounds.html). – Ryszard Czech Sep 14 '20 at 20:57
0

Or this one:

CO\w+?(?=CO|$)

see demo here: https://regex101.com/r/gFZomh/1

Basically: a "non-greedy" matching of all "word characters" after "CO" followed by a lookahead demanding another "CO" or end-of-string.

If you also want to match "non-word characters", you could modify the regexp to

CO[\w\W]+?(?=CO|$)

This will also work on something like "CO12dadaCO2da,fdCO345daaf" to produce the matches: ["CO12dada","CO2da,fd","CO345daaf"].

Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
0

Using Javascript, you can use

CO[^]*?(?=CO|$)
  • CO[^]*? Match CO, then any char including newlines as least as possible
  • (?=CO|$) Positive lookahead, assert what is on the right is either CO or the end of the string

REgex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70