1

I want to match a number placed at the end of a string or maybe the middle.

The example would be:

"chapter some word 12 or IV" or "chapter some word 12 or IV some word"

the number what I want to extract is "12" or IV from the string.

I have tried to look around with ?:\w* or ?=\w* but it does not work.

My regex:

if (preg_match('/ch\w*\s*\K(?|(?=((?<=|\s)\d+(?:\.\d+)?))|(?=([ivx]+(?![a-z]))))/i', $string, $matches)){
    print_r($matches);
}

Am I missing something with the regex? Any pointers in the right direction would be appreciated.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
niznet
  • 107
  • 1
  • 11

1 Answers1

2

You may use

'~\bch\D*\K\d+~i'

See the regex demo

**To match any 1+ digits or Roman numbers from 1 to 10 (I to X) after the word use

'~\bch.*?\K\b(?:\d+|VI{0,3}|I(?:[XV]|I{0,2}))\b~i'

See this regex demo

Here, a set of alternatives is added to \d+: VI{0,3}|I(?:[XV]|I{0,2}).

Details

  • \b - starting word boundary, no letter, digit or _ should appear immediately to the left of the current position
  • ch - a literal substring
  • \D* - 0+ non-digit chars
  • \K - match reset operator
  • \d+ - 1+ digits
  • VI{0,3}|I(?:[XV]|I{0,2}) - V and 0, 1, 2 or 3 Is (that is, Roman 5, 6, 7, 8), or I that is followed with X or V (that is, 9 and 4) or with 0 (so, Roman 1 is matched), 1 or 2 Is (Roman 2 and 3).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Could you do to match roman (ivx only) too? \D* make it don't match any roman number – niznet Oct 07 '19 at 13:20
  • @niznet Sorry, I think there is too little data: do you mean the Roman numbers as whole words may appear there? In between whitespace or punctuation? – Wiktor Stribiżew Oct 07 '19 at 13:22
  • @niznet Try `'~\bch.*?\K(?<!\w)(?:\d+|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))(?!\w)~i'`, see [this regex demo](https://regex101.com/r/SEByKM/1). Roman number regex is borrowed from [How do you match only valid roman numerals with a regular expression?](https://stackoverflow.com/a/267405/3832970). – Wiktor Stribiżew Oct 07 '19 at 13:25
  • Updated the question. I only need I , V , X (I until X) – niznet Oct 07 '19 at 13:27
  • 1
    @niznet From `I` to `X`: `'~\bch.*?\K\b(?:\d+|I{1,3}|IV|VI{0,3}|I?X)\b~i'`. See [this demo](https://regex101.com/r/SEByKM/3). – Wiktor Stribiżew Oct 07 '19 at 13:30