5

I have a string with text, numbers, and symbols. I'm trying to extract the numbers, and symbols from the string with limited success. Instead of getting the entire number and symbols, I'm only getting part of it. I will explain my regex below, to make it more clearer, and easier to understand.

\d : any number
[+,-,*,/,0-9]+ : 1 or more of any +,-,*,/, or number
\d : any number

Code:

$string = "text 1+1-1*1/1= text";

$regex = "~\d[+,-,*,/,0-9]+\d~siU";
preg_match_all($regex, $string, $matches); 

echo $matches[0][0];

Expected Results

1+1-1*1/1

Actual Results

1+1
jessica
  • 1,667
  • 1
  • 17
  • 35
  • 3
    A big improvement over your [previous question](http://stackoverflow.com/questions/32855791/string-to-symbols-and-numbers). Good luck. – John Conde Sep 30 '15 at 00:54
  • 1
    Throw that thing into: https://regex101.com and take a look at the righter upper box – Rizier123 Sep 30 '15 at 00:55
  • @Rizier123 Did you see this? \d : any number [+,-,*,/,0-9]+ : 1 or more of any +,-,*,/, or number \d : any number – jessica Sep 30 '15 at 00:58
  • @jessica Yes, I saw it. And now you probably want to throw your regex ^^ in there and see what it actually does – Rizier123 Sep 30 '15 at 01:01
  • It does as I said it does above. Do you see the expected results, and the actual results above in bold? That's the results. Same in regex101. – jessica Sep 30 '15 at 01:05
  • @Rizier123 I'm asking WHY it does that even though my regex expression is correct? – jessica Sep 30 '15 at 01:06

3 Answers3

3

Remove the U flag. It's causing the the + to be nongreedy in its matching. Also, you don't need commas between characters in your character list. (You only need 1 , if you're trying match it. You do need to escape - so that it doesn't think you're trying to make a range

  • :) Finally. Someone who found the problem! It seems siU was completely unnecessary in the regex above. Thanks for pointing that out. – jessica Sep 30 '15 at 01:34
  • @jessica np. Does it really still work with the commas? (Never tried that before and can't test from my phone –  Sep 30 '15 at 01:36
  • No. I've already removed the commas, as the commas were part of the problem, as @dxdy pointed out. But I think the siU were the main part of the problem, and since you addressed both the commas and the siU, you have the best answer. – jessica Sep 30 '15 at 01:38
  • @jessica I told you tu remove the 'U' flag but i think you just completely ignored the 'get rid of the 'U' flag"... – Sir McPotato Sep 30 '15 at 01:46
  • @vinxce when you said "for small things like that..." it was unclear that that the U flag was the problem. Edit it into your answer and take an upvote –  Sep 30 '15 at 01:50
  • @Vinxce Since you ignored my comment, I thought it would only be right if I ignore yours in reciprocation. :) – jessica Sep 30 '15 at 01:56
2

The problem here is that your regex does mix up quite a few unescaped metacharacters. In your character class you have [+,-,*,/,0-9]. You do not need to separate different characters with commas, that will only tell the regex-engine to include commas in your expression. Furthermore, you need to escape the -, as it has a special meaning inside the character class. As it is, it will be interpreted as 'characters from "," to "," instead of the literal character "-". A similar problem exists with the "/"-character. The expression \d[+\-*/0-9]+\d should do the trick.

dxdy
  • 540
  • 2
  • 13
  • ...Your eyes are misleading you...I DO already have a + sign at the end. – jessica Sep 30 '15 at 01:02
  • My eyes were indeed misleading me. Sorry about that. The problem lies somewhere else, check out the edited answer. – dxdy Sep 30 '15 at 01:13
  • As far as I know meta characters does not need to be escaped inside [] brackets. – jessica Sep 30 '15 at 01:20
  • 2
    Most of them do not, but some do indeed. Among them are `\ ^ ] -`. Consider this, how else is the engine supposed to know whether `0-9` means "the characters from 0 to 9" or "the characters 0,- and 9". – dxdy Sep 30 '15 at 01:25
  • Only \, are consider actual commas. The other un-escaped commas are considered as separators. – jessica Sep 30 '15 at 01:26
  • 2
    Unfortunately, the syntax does not work this way. Compare with [this](http://php.net/manual/en/regexp.reference.character-classes.php) page from the documentation. It explicitly states that "The minus (hyphen) character can be used to specify a range of characters in a character class. For example, [d-m] matches any letter between d and m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class." – dxdy Sep 30 '15 at 01:27
  • @jessica I said, you should throw that regex in there: http://stackoverflow.com/questions/32856068/using-regex-to-extract-numbers-and-symbols-from-string#comment53545709_32856068 – Rizier123 Sep 30 '15 at 01:29
  • @dxdy It seems that you are correct. My apologies. However, \ does not need to be escaped inside [] brackets. Also, you seem to have missed a comma when you were removing them, in your answer "\d[+\-,*\/0-9]+\d". – jessica Sep 30 '15 at 01:33
  • @dxdy Still. You made an effort in trying to help me so I'll give you an upvote for effort. – jessica Sep 30 '15 at 01:36
  • I did indeed miss that comma. Edited the answer to reflect that. – dxdy Sep 30 '15 at 01:43
-3

Didn't test it with your code but should work :)

((?:[0-9]+[\+|\-|\*|\/]?)+)

More in details, if you want to understand my pattern : https://regex101.com/r/mF0zO8/2

Sir McPotato
  • 899
  • 7
  • 21