Assuming that $filter
works fine and is the source is fetched correctly, you can also use a regular expression replace:
$contentImg = preg_replace('/^https?:/','', $string);
'/^https?:/'
is here a regex:
- the ^
character means the beginning of a string, such that you only removes potential protocols in the front.
- the ?
is a special character that specifies that the s
is optional. It will thus match both http:
and https:
.
Using regexes, you can write some queries more compact. Say (for the sake of answer) that you also wish to remove ftp
and sftp
, you can use:
'/^(https?|s?ftp):/'
Since |
means or and the brackets are for grouping purposes.
You also forgot to remove the colon (:
).
I'm however more worried that your $filter
will contain the entire page source code. In that case, it can do more harm than good since text containing http:
can also get removed. In order to parse and process XML/HTML, one better uses a DOMParser
. This will introduce some overhead, but as some software engineers argue: "Software engineering is engineering systems against fools, the universe currently produces more and more fools, the small bit of additional overhead is thus justifiable".
Example:
You should definitely use a DOMParser as argued before (since such approach is more failsafe):
$dom = new DOMDocument;
$dom->loadHTML($html); //$html is the input of the document
foreach ($dom->getElementsByTagName('img') as $image) {
$image->setAttribute('src',preg_replace('/^https?:/','',$image->getAttribute('src')));
}
$html = $dom->saveHTML(); //html no stores the new version
(running this in php -a
gives you the expected output for your test example).
Or in a post-processing step:
$html = get_the_content();
$dom = new DOMDocument;
$dom->loadHTML($html); //$html is the input of the document
foreach ($dom->getElementsByTagName('img') as $image) {
$image->setAttribute('src',preg_replace('/^https?:/','',$image->getAttribute('src')));
}
$html = $dom->saveHTML();
echo $html;
Performance:
Tests were performed about the performance using the php -a
interactive shell (1'000'000
instances):
$ php -a
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { str_replace(array('http:', 'https:'), '', 'http://www.google.com'); }; echo (microtime(true)-$timea); echo "\n";
5.4192590713501
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/^https?:/','', 'http://www.google.com'); }; echo (microtime(true)-$timea); echo "\n";
5.986407995224
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/https?:/','', 'http://www.google.com'); }; echo (microtime(true)-$timea); echo "\n";
5.8694758415222
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/(https?|s?ftp):/','', 'http://www.google.com'); }; echo (microtime(true)-$timea); echo "\n";
6.0902049541473
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { str_replace(array('http:', 'https:','sftp:','ftp:'), '', 'http://www.google.com'); }; echo (microtime(true)-$timea); echo "\n";
7.2881300449371
Thus:
str_replace: 5.4193 s 0.0000054193 s/call
preg_replace (with ^): 5.9864 s 0.0000059864 s/call
preg_replace (no ^): 5.8695 s 0.0000058695 s/call
For more possible parts (including sftp
and ftp
):
str_replace: 7.2881 s 0.0000072881 s/call
preg_replace (no ^): 6.0902 s 0.0000060902 s/call