1

Say I have the following string:

BlahBlah........1.000
Whatevah....2.000
Something......6.500

...that is, some text, followed by four or more dots, followed by a number (that may have a dot as a delimiter) followed by a newline (Linux or Windows, I don't know if that's important). It's a part of a larger string.

How do I extract the text and numbers into variables? More precisely an array of value pairs (array of arrays). I just can't get my head around regular expressions yet... :(

johnsyweb
  • 136,902
  • 23
  • 188
  • 247
DMIL
  • 693
  • 3
  • 7
  • 18
  • In fact the number doesn't have to be a number, it can be anything followed by a newline. – DMIL Dec 11 '11 at 08:39

1 Answers1

4

use this regex:

(?<word>\w+)\.+(?<number>\d+(\.\d+)?)

with preg_match_all():

preg_match_all("/(?<word>\w+)\.+(?<number>\d+(\.\d+)?)/", $yourString, $theArrayYouWantToStoreMatchesInIt);

To capture anything after 4 dots you can use this:

(?<word>\w+)\.{4,}(?<anything>.*)

The following will also capture strings that have spaces in their first part:

(?<beforeDots>[^\.]+)\.{4,}(?<afterDots>.*)

It's also a good idea to limit the matching text to certain range of characters to make the regex more accurate:

(?<beforeDots>[a-zA-Z0-9 ]+)\.{4,}(?<afterDots>[a-zA-Z0-9\. ]+)
fardjad
  • 20,031
  • 6
  • 53
  • 68
  • Hi, here is an update for not capturing the 3rd group (via ?:) and to match also non-demical numbers. (?\w+)\.+(?\d+(?:\.?\d+))? – Alex Emilov Dec 11 '11 at 08:48
  • Can you please modify it to match anything after the 4+ dots including more dots? For example Blahblah.....ok..thistoo so it returns Blahblah and ok..thistoo ? And also, not to match it if there's less than four dots? – DMIL Dec 11 '11 at 08:51
  • Replace `\.+` with `.{4,}` if you want to enforce "followed by four or more dots". – johnsyweb Dec 11 '11 at 08:54
  • Please help me just with this, what if there is a white space in the first or second part of the line, for example Blah blah blah......1.000 EUR it matches only the last blah and I'd like it to match everything since either the newline or the beginning of the string. I tried replacing \w+ with .* but it matches the dots as well so I need to qualify that as any number of any characters except a series of four or more dots. – DMIL Dec 11 '11 at 09:37
  • Never mind, got it. Is there anything regular expressions can't do?? – DMIL Dec 11 '11 at 09:51
  • Oops, thanks, your solution is better than mine. Thanks for the tip about regular-expressions.info – DMIL Dec 11 '11 at 09:52
  • @DMIL `Is there anything regular expressions can't do` matching and parsing non regular grammars. – fardjad Dec 11 '11 at 09:55