0

I am scraping parts of a webpage and then inserting the results into mySQL.

The source code of a problem area is:

<span class="profilelastlogin">
                    31,
                Kiev, Ukraine
                </span>

I want to select the 3 items, Age, City, Country and then assign them each to an individual varible.

I am using this regex to select to full string but it doesn't work. I would appreciate any guidance.

$regexAgeCityCountry = '/<span class="profilelastlogin">(.*?)<\/span>/';
                preg_match_all($regexAgeCityCountry, $page, $outputAgeCityCountry);
Biffen
  • 6,249
  • 6
  • 28
  • 36
h-y-b-r-i-d
  • 315
  • 1
  • 3
  • 9

4 Answers4

1

You can use the s (PCRE_DOTALL) modifier to treat your code as a single line, so the '.' will match newline characters.

Here is the php reference:

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Here is a working example with a fix

1

Why don't just match 3 separate groups?

 /<span class="profilelastlogin">(.*?),(.*?),(.*?)<\/span>/s

Group 1 contains the age, group 2 the city and group 3 contains the country.

You could also use this regex to make sure the age will always be numeric:

/<span class="profilelastlogin">([0-9]*),(.*?),(.*?)<\/span>/s
georg
  • 211,518
  • 52
  • 313
  • 390
Rolf ツ
  • 8,611
  • 6
  • 47
  • 72
  • Thank you. I believe you have a type / not \ at beginning and end but yest that works great: Array ( [0] => Array ( [0] => 34, Simferopol, Russian Federation ) [1] => Array ( [0] => 34 ) [2] => Array ( [0] => Simferopol ) [3] => Array ( [0] => Russian Federation ) ) – h-y-b-r-i-d Feb 16 '15 at 10:49
0

put all data in 1 variable first, than

$arr = explode(',',$yourvariable);

$city = $arr[0];

$state = $arr[1]; 

$country = $arr[2];
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
Param sohi
  • 121
  • 9
0
<span class="profilelastlogin">\s+\K|\G(?!^)([^,]+),?\s*(?=[\s\S]*<\/span>)

You can try this to capture 3 parts.See demo.

https://www.regex101.com/r/rK5lU1/28

$re = "/<span class=\"profilelastlogin\">\\s+\\K|\\G(?!^)([^,]+),?\\s*(?=[\\s\\S]*<\\/span>)/mi";
$str = "<span class=\"profilelastlogin\">\n 31,\n Kiev, Ukraine\n </span>";

preg_match_all($re, $str, $matches);
vks
  • 67,027
  • 10
  • 91
  • 124