0

I'm trying to use str_replace() to search and replace specific strings in html pages. For example, I am replacing:

$search_string = 'The new&nbsp;funding follows a <a href="http://blog.classpass.com/2015/01/15/were-so-excited-to-share-our-biggest-news-ever/">$40 million raise announced</a> in January.';

with

$replacement = '<span class="newString">The new&nbsp;funding follows a <a href="http://blog.classpass.com/2015/01/15/were-so-excited-to-share-our-biggest-news-ever/">$40 million raise announced</a> in January.</span>';

$subject = file_get_contents("some-web-site.html");

$new_string = str_replace($search_string, $replacement, $subject);

However, the replacement doesn't work, when $subject contains a lot of html. If i just do:

$subject = "some text some text " .  $search_string . "some text some text";

the sentence is correctly replaced. The issue seems to arise specifically due to the &nbsp; element.. if the $search_string does not contain &nbsp; then it will be replaced successfully no matter the complexity of the $subject element (i.e. even if it contains a full web page).

Any idea why is that ?

user3857924
  • 86
  • 3
  • 15
  • Are you using `preg_replace` or `str_replace`? You mention both. Doing `str_replace` with something like `" "` seems to work fine for me. `preg_replace` could be failing because it's coming across unescaped metacharacters, e.g. `&`. – Chris Sprague Nov 11 '15 at 21:39
  • Sorry for the confusion, I am using str_replace everywhere. But I would use preg_replace if it can solve the problem – user3857924 Nov 11 '15 at 21:48
  • Possible duplicate of [RegEx with preg\_match to find and replace a SIMILAR string](http://stackoverflow.com/questions/33671497/regex-with-preg-match-to-find-and-replace-a-similar-string) – Madivad Nov 12 '15 at 14:54

1 Answers1

0

It seems to be a problem with the $40 part of $search_string, not the &nbsp;.

Take this program for example:

<?php

$input = '$40 &nbsp; replace failed'; // string literal

$str_replace_result = str_replace('$40 &nbsp; replace failed', 'str_replace worked', $input);
print_r($str_replace_result . "\n"); // ==> works

$preg_replace_result = preg_replace('/$40 &nbsp; replaced failed/', 'preg_replace worked', $input);
print_r($preg_replace_result . "\n"); // ==> fails

// Example without the "$40"
$another_string = '&nbsp; replace 2 failed';
$preg_replace_result2= preg_replace('/&nbsp; replace 2 failed/', 'preg_replace worked', $another_string);
print_r($preg_replace_result2. "\n"); // ==> works, implying the "$40" bit was the issue

To fix this issue use preg_quote, e.g.:

// Solution
$escaped_search = preg_quote('/$40 &nbsp; replace failed/');
$newnewstr = preg_replace($escaped_search, 'preg_replace worked', $input);
print_r($newnewstr . "\n"); // ==> works

More info in this question.

In conclusion, the issue was that unescaped metacharacters caused the matching to fail.

All that said, is it required that you replace the entirety of the web page in this way? This approach (as is apparent here) seems to be error prone.

Community
  • 1
  • 1
Chris Sprague
  • 740
  • 1
  • 12
  • 22
  • Thanks but I dont think it's related to the dollar sign. If I create a small $subject = "some text". $search_string/*(this contains $40)*/ . "athoer text") it works – user3857924 Nov 11 '15 at 22:17
  • I just need to replace that string inside the web page, not the whole page – user3857924 Nov 11 '15 at 22:19
  • @user3857924 alright. Only other thing I could think of is using smart strings (`"`) vs string literals (`'`), where you want to be using the latter here due to the nature of pattern matching (this still relates back to the `"$"` issue). Other than that, if `preg_quote` doesn't fix your issue, I'm not sure, at least based on the given information. – Chris Sprague Nov 11 '15 at 22:22