0

On my page the users are allowed to insert embed links to their music. But if the width of the object is more than 600px, it doesn´t fit into page. So I was searching for some way to find the width parameter of the iframe tag and if the value is more than 600px, to change it to 600px. But my google skills failed me, so I hope someone here can help. Inserted iframes look for example like this:

<iframe width="660" height="180" src="//www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2FCLCK_Podcast%2Fclck-podcast-20-czscream%2F&amp;embed_type=widget_standard&amp;embed_uuid=fd5ce7dc-9687-45e1-8368-27631b48385f&amp;hide_tracklist=1&amp;replace=0&amp;hide_cover=1" frameborder="0"></iframe>
Michal S
  • 1,103
  • 6
  • 19
  • 31

3 Answers3

6

Here's how to do it properly with PHP's native DOM extensions. Unlike a regex, this is guaranteed to work for all the cases and will not choke when supplied with a slightly different HTML markup format:

$dom = new DOMDocument;
$dom->loadHTML($html);    
$html = '';

foreach ($dom->getElementsByTagName('iframe') as $iframe) {

    // get the 'width' attribute value
    $width = (int) $iframe->getAttribute('width');     

    // obvious
    if ($width > 600) {
            $iframe->setAttribute('width', 600);
    }   

    // append the modified HTML to string
    $html .= $dom->saveHTML($iframe) . "\n";                
}

echo $html;

This will check the width attribute of all <iframe> tags and set it to 600 if the current value is greater than that.

Online demo

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • Thanks, that looks nice...but if I got more iframes in the string, in output it leaves just the first iframe, others are lost... – Michal S Feb 27 '14 at 14:12
  • Still doesn´t work :/ Im even trying it exactly according to your example, with multiple but static iframes... – Michal S Feb 27 '14 at 15:32
  • @MichalS: I don't understand what "doesn't work" means. Can you show the HTML string you're working with? (Please also post the expected output, so I could take a look.) – Amal Murali Feb 27 '14 at 15:34
  • Here I edited your example: https://eval.in/106464 Expected output int this case would be three iframes, but there is still only one of them. – Michal S Feb 27 '14 at 15:40
  • 1
    @MichalS: Ah, I can't believe I forgot to update the `$html` variable when I updated the answer. **It's fixed now.** You could also use use `$html = $dom->saveHTML();` to get the whole HTML markup, but that'll also include the `DOCTYPE` declarations as well. I just assumed you didn't want that, and have modified my answer to produce the required output (the demo's been updated, too). – Amal Murali Feb 27 '14 at 15:57
2

This reads to me like the wrong approach, see: the XY problem.

When looking at your question it looks like you are allowing users to add and run arbitrary code on your website. Which is probably a pretty awful thing to do. What is preventing someone to exploit XSS if they are allowed to do that? Even when you are filtering correctly (which I doubt) it still sounds like the wrong approach.

Users should only have to add a link to your site so you are in total control of the further embedding and rendering. This will prevent all kinds of nasty shit.

Community
  • 1
  • 1
PeeHaa
  • 71,436
  • 58
  • 190
  • 262
-3

Let's try with regular expressions:

 $pattern = '/<iframe (.*)width="([0-9]{4,}|[6-9][0-9]{2})"(.*)>(.*)<\/iframe>/';
 $replace = '<iframe $1 width="600" $3>$4</iframe>';
 $output = preg_replace($pattern, $replace, $input);

$output would be your valid string with valid iframes ; )

Online Demo

Doro
  • 755
  • 8
  • 25
  • Nice! That works like a charm! But thanks to Amal too! – Michal S Feb 27 '14 at 15:45
  • 2
    -1. This solution is so brittle and will break for a lot of cases. Using `.*` for matching is almost never a good idea. – Amal Murali Feb 27 '14 at 15:59
  • well, your method is also not perfect. look at the example https://eval.in/106488. I think that we want to give a solution and not to argue which method is better and more safety. – Doro Feb 27 '14 at 16:07
  • 3
    @Doro: ` ...` is **invalid markup**. The regex solution will fail for valid markup, too. There's the difference. And if you're talking about the errors, you could use `libxml_use_internal_errors(1)` suppress the error messages. See [this demo](https://eval.in/106489). – Amal Murali Feb 27 '14 at 16:11
  • using `libxml_use_internal_errors(1)` is not a solution because u still return invalid string. Give me examples and I'll try to fix it. Telling that "your method is broken" without telling why is childish. No offence. – Doro Feb 27 '14 at 16:20
  • 4
    @Doro *"I think that we want to give a solution and not to argue which method is better and more safety"* Wrong! We want to give **quality** answers, not bad answers. Regex should not be used for parsing HTML, we use DOMDocument instead in this case. -1 – kittycat Feb 27 '14 at 18:10
  • Ok, after some researching I can agree with you that parsing HTML with regex is not the best idea. Here is the Q: http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la . I just wonder why you couldn't give me some examples instead of minusing my answer. – Doro Feb 28 '14 at 06:29