6

Assume we have this text:

...
settingsA=9, 4.2 
settingsB=3, 1.5, 9, 2, 4, 6
settingsC=8, 3, 2.5, 1
...

The question is how can I capture all the numbers that are in specific row using a single step?

Single step means:

  • single regex pattern.
  • single operation (no loops or splits, etc.)
  • all matches are captured in one array.

Let's say I want to capture all the numbers that are present in row which starts with settingsB=. The final result should look like this:

3
1.5
9
2
4
6

My failed attempts:

<?php
    $subject =
        "settingsA=9, 4.2
         settingsB=3, 1.5, 9, 2, 4, 6
         settingsC=8, 3, 2.5, 1";

    $pattern = '([\d\.]+)(, )?' // FAILED!
    $pattern = '(?:settingsB=)(?:([\d\.]+)(?:, )?)' // FAILED!
    $pattern = '(?:settingsB=)(?:([\d\.]+)(?:, )?)+' // FAILED!
    $pattern = '(?<=^settingsB=|, )([\d+\.]+)' // FAILED!

    preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
    if ($matches) {
        print_r($matches);
    }
?>

UPDATE 1: @Saleem's example uses multiple steps instead of a single step, unfortunately. I'm not saying that his example is bad (it actually works), but I want to know if there is another way to do it and how. Any ideas?

UPDATE 2: @bobble bubble provided a perfect solution for this challenge.

OlavH
  • 63
  • 5

2 Answers2

4

You can use the \G anchor to glue matches to the end of a previous match. This pattern which also uses \K to reset before the desired part would work with PCRE regex flavor.

(?:settingsB *=|\G(?!^) *,) *\K[\d.]+
  • (?: opens a non-capturing group for alternation
  • match settingsB, followed by * any amount of space, followed by literal =
  • |\G(?!^) or continue where the previous match ended but not start
  • *, and match a comma preceded by optional space
  • ) end of alternation (non-capturing group)
  • *\K reset after optional space
  • [\d.]+ match one or more digits & periods.

If the sequence contains tabs or newlines, use \s for whitespace character instead of space.

See demo at regex101 or PHP demo at eval.in

or this more compatible pattern with use of a capturing group instead of \K which should work in any regex flavor that supports the \G anchor (Java, .NET, Ruby...)

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
1

Here is python solution but will post PHP rx later. However, python regex and php are quite similar.

(?<=settingsB=)(\d+(?:\.\d+)?(?:, )?)+

Python:

import re

subject = """
...
settingsA=9, 4.2
settingsB=3, 1.5, 9, 2, 4, 6
settingsC=8, 3, 2.5, 1
...
"""

rx = re.compile(r"(?<=settingsB=)(\d+(?:\.\d+)?(?:, )?)+", re.IGNORECASE)
result = rx.search(subject)

if result:
    numString = result.group()

    for n in [f.strip() for f in numString.split(',')]:
        print(n)

PHP

$subject =
    "settingsA=9, 4.2
     settingsB=3, 1.5, 9, 2, 4, 6
     settingsC=8, 3, 2.5, 1";

$pattern = '/(?<=settingsB=)(\d+(?:\.\d+)?(?:, )?)+/i';
preg_match($pattern, $subject, $matches);

if ($matches) {

    $num = explode(",", $matches[0]);

    for ($i = 0; $i < count($num); $i++) {
        print(trim($num[$i]) . "\n");
    }
}

Output:

3
1.5
9
2
4
6
Saleem
  • 8,728
  • 2
  • 20
  • 34
  • I pasted that into regex101.com and it only matches the last number - even when the regex is set to Python. Any idea why? – Jerry Jeremiah Mar 07 '16 at 00:33
  • @Saleem, thank you for replying, but there must be a better solution. I want to return the result in an array **without splitting**, eg. `(numString.split)`. Let regex do the splitting. Any ideas? – OlavH Mar 07 '16 at 00:40
  • @JerryJeremiah, it's because regex101 shows last group only. Please see https://regex101.com/r/wD4vQ5/1 – Saleem Mar 07 '16 at 00:44
  • @OlavH, yes I agree. There should be better solution but something is better than nothing in short time :) – Saleem Mar 07 '16 at 00:44
  • Just for fun: you know `premature optimization is the root of all evil` One solution is apply two different regex. One to capture whole string as I did initially and than apply another regex on result of last capture to eliminate coma ` , ` – Saleem Mar 07 '16 at 00:45
  • Updated solution for PHP – Saleem Mar 07 '16 at 00:56