3

I'm trying to change the src attribute of an iframe from http to https. For example, my string is:

<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>

What I need is to change it to

<p>Some random text <iframe src="https://some-random-link.com" width="425" height="350" frameborder="0" ></iframe></p>

By far, I've been trying with preg_replace but no results:

$res = preg_replace( '/<iframe\s+.*?\s+src="http(.*?)".*?<\/iframe>/', '<iframe\s+.*?\s+src="https$1".</iframe>', $string);

Thank you

Adrian
  • 95
  • 1
  • 8
  • @Thielicious as much as that might be possible, not all clients will have JavaScript enabled, thus, doing this server-side ensures the changes are *actually* propagated. – ctwheels Nov 01 '17 at 14:33
  • @Thielicious well to **ensure** they are using the proper protocol, the only way is to do this server-side. – ctwheels Nov 01 '17 at 14:40
  • @ctwheels It's mostly bots that have javascript disabled. You really don't need to worry about the average user having javascript disabled. Based on [this website](https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missing-out-on-javascript-enhancement/), it appears as though rougly 1% of people disable javascript, and if you also account for the amount of people actually using your site (depending on the function of your site), this percentage using your site is probably much less. – GrumpyCrouton Nov 01 '17 at 14:48
  • @GrumpyCrouton that's true, but it still needs to be taken into account. Some users disabled JavaScript for accessibility purposes. See [this StackExchange Software Engineering post for more information about *why do people disable JavaScript?*](https://softwareengineering.stackexchange.com/questions/26179/why-do-people-disable-javascript) – ctwheels Nov 01 '17 at 14:55
  • @ctwheels Thanks for the link. One section of the accepted answer stood out to me; _there might be perfectly good situations where you don't need to bother about supporting JavaScript_ - We have no idea what OP's site is, but it is very possible that it does not need to be accounted for. I think either a server-side or client-side solution would probably be fine for OP. – GrumpyCrouton Nov 01 '17 at 14:59
  • 1
    @GrumpyCrouton you're likely correct, but the OP didn't tag `JavaScript` in the question, they only tagged `PHP` and `Regex`, thus, the answer should be presented for `PHP`. – ctwheels Nov 01 '17 at 15:04
  • @ctwheels Fair enough :) – GrumpyCrouton Nov 01 '17 at 15:29

3 Answers3

4

Try the following REGEX instead(DEMO):

/<iframe.*?s*src="http(.*?)".*?<\/iframe>/

But beware, You CAN NOT parse HTML with REGEX properly. Please, use some XML parser instead.

Also, it seems you only want to change http to https. So for that try the following instead:

if(strpos($string, 'https') === false)
{
    $string = str_replace("http", "https", $string);
}
mega6382
  • 9,211
  • 17
  • 48
  • 69
3

You can try this regex :

/(<iframe.+?src=".*?)(?=:)/

Live demo here

Sample code in php:

$re = '/(<iframe.+?src=".*?)(?=:)/';
$str = '<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>';
$subst = '\\1s';

$result = preg_replace($re, $subst, $str);

echo $result; 
// <p>Some random text <iframe src="https://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>
Ashish Ranjan
  • 5,523
  • 2
  • 18
  • 39
1

Why should you use a legitimate DOM parser instead of regex -- even for such a minor string manipulation?

  • Because regex is not "DOM-aware" -- it will treat a substring that isn't a tag as if it was a tag just because it resembles a tag.

  • Because your input may change slightly with or without your consent.

  • Because your required string manipulation may grow in complexity as your application matures.

  • Because using dedicated tools for the tasks they were designed to tackle, makes you appear to be a careful, considered, and professional IT craftsman/craftswoman.

First, a loop of iframe nodes using only DOM parser followed by a url parser, then substr_replace() to inject the 's' without removing any of the original characters.

Code: (Demo)

$html = <<<HTML
<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('iframe') as $iframe) {
    $src = $iframe->getAttribute('src');
    if (parse_url($src, PHP_URL_SCHEME) === 'http') {
        $iframe->setAttribute('src', substr_replace($src, 's', 4, 0));
    }
}
echo $dom->saveHTML();

Alternatively, you can target the qualifying src attributes with XPath.

Code: (Demo)

$html = <<<HTML
<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe>
<iframe src="https://cant-touch-this.com" width="425" height="350" frameborder="0"></iframe>
</p>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//iframe[starts-with(@src, 'http') and not(starts-with(@src, 'https'))]/@src") as $src) {
    $src->nodeValue = substr_replace($src->nodeValue, 's', 4, 0);
}
echo $dom->saveHTML();

Not only will these techniques be more reliable than regex, the syntax to these parsers is far more readable by humans and will make your script much easier to manage over time.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136