0

The following 2 lines are my code:

$rank_content = file_get_contents('https://www.championsofregnum.com/index.php?l=1&ref=gmg&sec=42&world=2');
$tmp_ = preg_replace("/.+width=.16.> /Uis", "", $rank_content, 1);

The second line above causes an infinite loop. In contrary, the following alternatives DO work:

$tmp_ = preg_replace("/.+width=.16.> /Ui", "", $rank_content, 1);
$tmp_ = preg_replace("/[^§]+width=.16.> /Uis", "", $rank_content, 1);

But sadly, they do not give me what I want - both alternatives do not include line breaks within $rank_content.

Also, if I replaced the file_get_contents function with something like

$rank_content = "asdfas\nasdfasdfaswidth=m16m> teststring";

There are no problems either, although \n represents a line break, too, doesn’t it?!

So do I understand it right that RegEx has problems in noticing a String with line breaks in it?

How can I filter a substring of $rank_content (which has multiple lines in it) by removing some lines until something like "width="16" " appears? (Can be seen in the site's source code)

phil294
  • 10,038
  • 8
  • 65
  • 98
  • No, `\n` represents line breaks only in double quoted string. – Marek Jun 26 '14 at 15:31
  • `{1}` is useless in a regex... – Niet the Dark Absol Jun 26 '14 at 15:31
  • thanks, I edited these two issues – phil294 Jun 26 '14 at 15:34
  • edited again - s instead of m – phil294 Jun 26 '14 at 15:35
  • I don't see anything on the linked page that matches `width=.16.>`. Was that a mistake? – Mr. Llama Jun 26 '14 at 15:56
  • the source code of the page tells me there are a bunch of phrases like "realm." width="16" src="include..."" – phil294 Jun 26 '14 at 16:53
  • It seems the problem is the LENGTH of the haystack variable $rank_content. Its length is about 90,000, while the maximum allowed length for regex match() is about 30,000. For those interested: http://stackoverflow.com/questions/8268624/php-preg-match-all-limit I myself am going to solve the problem using another method for reading the contents of a website like HTML Unit. – phil294 Jun 26 '14 at 20:02
  • What you've got here is an [x/y problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You haven't described what you're trying to do and have focussed entirely on the problems of the solution you've chosen to do it. Also, the description is quite misleading - there is no infinite loop, it's very slow, but it'll probably complete if you leave it long enough (it did for me); and it's so slow because of the regex you're using. – AD7six Jun 26 '14 at 23:51

2 Answers2

0

Replace the m modifier with the s modifier. m changes the behaviour of ^ and $, whereas s changes the behaviour of .

That said, you should not be parsing HTML with regex. Seriously. Bad things happen.

Community
  • 1
  • 1
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
0

I give up on it: It seems the problem is the LENGTH of the haystack variable $rank_content. Its length is about 90,000, while the maximum allowed length for regex match() is about 30,000, so I guess it is the same for regex replace(). Solving this problem would surely be possible, if somebody is interested: Have a look into this link -> PHP preg_match_all limit

I myself am going to solve the problem using another method for reading the contents of a website like HTML Unit or maybe retrieving the site line after line.

Community
  • 1
  • 1
phil294
  • 10,038
  • 8
  • 65
  • 98