2

I have been having difficulty extracting specific characters from a string using preg_replace(). All the strings are the consistent as displayed below with the two examples.

I'm trying to extract the quantity integer, ie. for the first example I would get 200 and the second I would get 50.

Example Strings

$string = 'Sunscreen 25g (200 Quantity)';

$string = 'Lubricant 100ml (50 Quantity)';

Regex Code

$product = preg_replace('/(Sunscreen|Lubricant)/i', '', $string); followed by:

$product = preg_replace('/(\(d*.Quantity\))/i', '$0', $product)

Expected Result

From the first string: int(200) Second string: int(50)

Any help would be appreciated. I cannot get the numbers just before "Quantity" and after the "(".

Dean
  • 755
  • 3
  • 15
  • 31

4 Answers4

2

You don't need to throw multiple preg_ calls at this task, just match the whole string and only capture the digits that follow the first encountered (. Replace the whole string with the captured digits -- this way there is no temporary array to access; a string input is converted directly into the desired output string.

Code: (Demo)

$strings = [
    'Sunscreen 25g (200 Quantity)',
    'Lubricant 100ml (50 Quantity)',
    'WD-40 100ml (75 Quantity)',
];

foreach ($strings as $string) {
    echo preg_replace('~[^(]+\((\d+).*~', '$1', $string) . "\n";
}

Output:

200
50
75

In fact, preg_replace() can happily process an array of strings. (Demo)

var_export(preg_replace('~[^(]+\((\d+).*~', '$1', $strings));

Breakdown:

[^(]+    #match one or more non-left-parenthesis characters
\(       #match literal left parenthesis
(        #begin capture group 1
  \d+    #match one or more digits
)        #end capture group 1
.*       #match the remainder of the string

Alternatively, if you want to create an array with the quantity digit in it (this is less direct because the target string has to be extracted from the generated array), you can use preg_match(), but there is definitely no reason to use preg_match_all(). \K restarts the full string match so no capture groups are needed.

Code: (Demo) ...same output as above

foreach ($strings as $string) {
    echo (preg_match('~\(\K\d+~', $string, $match) ? $match[0] : 'no quantity') . "\n";
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
1

I found a function in How to get a substring between two strings in PHP? and modified to take only last occrance of '(' and is also found here How to get the last occurrence of a string?

 function getValue($string){
      $start = '(';
      $end = 'Quantity';
      $string = ' ' . $string;
      $ini = strrpos($string, $start);
      if ($ini == false) return '';
      $ini += strlen($start);
      $len = strpos($string, $end, $ini) - $ini;
      return substr($string, $ini, $len);
}
        
$product= (int)getValue('Sunscreen 25g (200 Quantity)');
        
var_dump($product);
Zar Ni Ko Ko
  • 352
  • 2
  • 7
1

Instead of doing 2 replacements, you could use a single pattern with a capturing group to get either 200 or 50.

Then you can convert group 1 with the digits to an int using for example intval.

\b(?:Sunscreen|Lubricant)\h+[^()]*\((\d+)\h+Quantity\)

Explanation

  • \b(?:Sunscreen|Lubricant) Word boundary, then match either one of the alternatives
  • \h+ Match 1+ horizontal whitespace chars
  • [^()]*\( Match 0+ times any char except ( and )
  • (\d+) Capture group 1, match 1+ digits (this is the value that you want)
  • \h+Quantity Match 1+ horizontal whitespace chars
  • \) Match )

Regex demo | Php demo

For example

$re = '`\b(?:Sunscreen|Lubricant)\h+[^()]*\((\d+)\h+Quantity\)`';
$str = 'Sunscreen 25g (200 Quantity)
Lubricant 100ml (50 Quantity)';

preg_match_all($re, $str, $matches);

$result = array_map("intval", $matches[1]);
var_dump($result);

Output

array(2) {
  [0]=>
  int(200)
  [1]=>
  int(50)
}

You might also make the match a bit more specific by matching the digits and the units:

\b(?:Sunscreen|Lubricant)\h+\d+(?:g|ml)\h+\((\d+)\h+Quantity\)

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

in your examples, if your strings in back of your numbers is not stable and will change, you can use \d with a plus to extract all numbers from your characters, for example:

$string = 'Sunscreen 25g (200 Quantity)';
preg_match_all('/\d+/', $string, $match);
print_r($match);

the result should be:

Array
(
    [0] => Array
        (
            [0] => 25
            [1] => 200
        )

)

but if your strings are stable (Sunscreen and Lubricant) you can use this regex:

$string = 'Sunscreen 25g (200 Quantity)';
preg_match_all('/Sunscreen ([\d\.]*)/i', $string, $match);
print_r($match);

$string = 'Lubricant 100ml (50 Quantity)';
preg_match_all('/Lubricant ([\d\.]*)/i', $string, $match);
print_r($match);

and again result should be:

Array
(
    [0] => Array
        (
            [0] => Sunscreen 25
        )

    [1] => Array
        (
            [0] => 25
        )

)


Array
(
    [0] => Array
        (
            [0] => Lubricant 100
        )

    [1] => Array
        (
            [0] => 100
        )

)

or simpler:

$string = 'Sunscreen 25g (200 Quantity)';
preg_match_all('/([\d\.]*) Quantity/i', $string, $match);
print_r($match);

result:

Array
(
    [0] => Array
        (
            [0] => 200 Quantity
        )

    [1] => Array
        (
            [0] => 200
        )

)
AmirAli Esteki
  • 542
  • 3
  • 13
  • 1. this is not what the OP has asked for. 2. the pattern modifiers `m` and `u` are absolutely pointless in your patterns 3. a dot inside of a character class does not need escaping. ...So you have teaching people incorrect techniques. – mickmackusa Sep 01 '20 at 05:49
  • @mickmackusa 1. the owner ask for extracing quantities and "preg_replace" is not for getting a data. the correct way is using preg_match or preg_match_all. 2. you right i put them as my habit. i will remove them now. 3. if you remove \. from my regexes, float numbers do not specifing in strings. such as: 200.5 Quantity will return 5 but with \. regex will return 200.5 Quantity – AmirAli Esteki Sep 01 '20 at 06:00
  • A dot inside of a character class does not need a slash. – mickmackusa Sep 01 '20 at 06:09