Change iframe src with regex

Question

I'm trying to change the src attribute of an iframe from http to https. For example, my string is:

<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>

What I need is to change it to

<p>Some random text <iframe src="https://some-random-link.com" width="425" height="350" frameborder="0" ></iframe></p>

By far, I've been trying with preg_replace but no results:

$res = preg_replace( '/<iframe\s+.*?\s+src="http(.*?)".*?<\/iframe>/', '<iframe\s+.*?\s+src="https$1".</iframe>', $string);

Thank you

@Thielicious as much as that might be possible, not all clients will have JavaScript enabled, thus, doing this server-side ensures the changes are *actually* propagated. — ctwheels, Nov 01 '17 at 14:33
@Thielicious well to **ensure** they are using the proper protocol, the only way is to do this server-side. — ctwheels, Nov 01 '17 at 14:40
@ctwheels It's mostly bots that have javascript disabled. You really don't need to worry about the average user having javascript disabled. Based on [this website](https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missing-out-on-javascript-enhancement/), it appears as though rougly 1% of people disable javascript, and if you also account for the amount of people actually using your site (depending on the function of your site), this percentage using your site is probably much less. — GrumpyCrouton, Nov 01 '17 at 14:48
@GrumpyCrouton that's true, but it still needs to be taken into account. Some users disabled JavaScript for accessibility purposes. See [this StackExchange Software Engineering post for more information about *why do people disable JavaScript?*](https://softwareengineering.stackexchange.com/questions/26179/why-do-people-disable-javascript) — ctwheels, Nov 01 '17 at 14:55
@ctwheels Thanks for the link. One section of the accepted answer stood out to me; _there might be perfectly good situations where you don't need to bother about supporting JavaScript_ - We have no idea what OP's site is, but it is very possible that it does not need to be accounted for. I think either a server-side or client-side solution would probably be fine for OP. — GrumpyCrouton, Nov 01 '17 at 14:59
@GrumpyCrouton you're likely correct, but the OP didn't tag `JavaScript` in the question, they only tagged `PHP` and `Regex`, thus, the answer should be presented for `PHP`. — ctwheels, Nov 01 '17 at 15:04

mega6382 · Accepted Answer · 2017-11-01T15:00:50.737

4

Try the following REGEX instead(DEMO):

/<iframe.*?s*src="http(.*?)".*?<\/iframe>/

But beware, You CAN NOT parse HTML with REGEX properly. Please, use some XML parser instead.

Also, it seems you only want to change http to https. So for that try the following instead:

if(strpos($string, 'https') === false)
{
    $string = str_replace("http", "https", $string);
}

edited Nov 01 '17 at 15:00

answered Nov 01 '17 at 14:24

mega6382

9,211
17
48
69

The first `.*` should be `.*?` or you'll match from the first `iframe` to the last – ctwheels Nov 01 '17 at 15:00

Ashish Ranjan · Answer 2 · 2017-11-01T15:02:05.423

3

You can try this regex :

/(<iframe.+?src=".*?)(?=:)/

Live demo here

Sample code in php:

$re = '/(<iframe.+?src=".*?)(?=:)/';
$str = '<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>';
$subst = '\\1s';

$result = preg_replace($re, $subst, $str);

echo $result; 
// <p>Some random text <iframe src="https://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>

edited Nov 01 '17 at 15:02

answered Nov 01 '17 at 14:25

Ashish Ranjan

5,523
2
18
39

The first `.+` should be `.+?` or you'll match from the first `iframe` to the last `src` – ctwheels Nov 01 '17 at 15:00
1

not exactly first `iframe` to the last, first `iframe` to the last `src`. anyways thanks, updated. – Ashish Ranjan Nov 01 '17 at 15:03

score 1 · Answer 3 · answered Feb 17 '20 at 13:43

Why should you use a legitimate DOM parser instead of regex -- even for such a minor string manipulation?

Because regex is not "DOM-aware" -- it will treat a substring that isn't a tag as if it was a tag just because it resembles a tag.
Because your input may change slightly with or without your consent.
Because your required string manipulation may grow in complexity as your application matures.
Because using dedicated tools for the tasks they were designed to tackle, makes you appear to be a careful, considered, and professional IT craftsman/craftswoman.

First, a loop of iframe nodes using only DOM parser followed by a url parser, then substr_replace() to inject the 's' without removing any of the original characters.

Code: (Demo)

$html = <<<HTML
<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe></p>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('iframe') as $iframe) {
    $src = $iframe->getAttribute('src');
    if (parse_url($src, PHP_URL_SCHEME) === 'http') {
        $iframe->setAttribute('src', substr_replace($src, 's', 4, 0));
    }
}
echo $dom->saveHTML();

Alternatively, you can target the qualifying src attributes with XPath.

Code: (Demo)

$html = <<<HTML
<p>Some random text <iframe src="http://some-random-link.com" width="425" height="350" frameborder="0"></iframe>
<iframe src="https://cant-touch-this.com" width="425" height="350" frameborder="0"></iframe>
</p>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//iframe[starts-with(@src, 'http') and not(starts-with(@src, 'https'))]/@src") as $src) {
    $src->nodeValue = substr_replace($src->nodeValue, 's', 4, 0);
}
echo $dom->saveHTML();

Not only will these techniques be more reliable than regex, the syntax to these parsers is far more readable by humans and will make your script much easier to manage over time.

Change iframe src with regex

3 Answers3

Linked