-1

Problem:

I need to confirm that iframe have one type of link with the following format:

http://www.example.com/embed/*****11 CHARACTERS MAX.****?rel=0

Starts with: http://www.example.com/embed/
Ends with: ?rel=0
11 CHARACTERS MAX. means in this spot, there can any 11 characters. Don't go beyond 11.

NOTE: none of the specified tags are ensured to be in every post. It depends on how user uses the editor.

I'm using PHP


I used the line below to make sure all tags are excluded except the ones specified:

$rtxt_offer = preg_replace('#<(?!/?(u|br|iframe)\b)[^>]+>#', '', $rtxt_offer);
user311509
  • 2,856
  • 12
  • 52
  • 69
  • Please see: [Extract all the text and img tags from HTML in PHP.](http://stackoverflow.com/q/8021543/367456) (closed). – hakre Nov 13 '11 at 10:38
  • possible duplicate of [Best methods to parse HTML with PHP](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php) - Main point: Consider to not use regular expressions to parse HTML. – hakre Nov 13 '11 at 10:39
  • 1
    Your first regex was basically a reimplementation of `strip_tags()`. You can of course augment it with another assertion just for `|iframe (?=src=)`, but it's not worth the effort. HTMLPurifier might be what you are searching for (though even more effort to do what you want). – mario Nov 13 '11 at 10:46

2 Answers2

0

First of all, there is built-in function in PHP that strips tags for you: http://php.net/manual/en/function.strip-tags.php no need to use slow regex here.

Steps you'll need to solve your problem:

  1. Parse this text as DomDocument
  2. Get iframe node from it
  3. Get src attribute from iframe and parse it with parse_url
  4. Now you can perform easy checks on all components returned by parse_url

Happy coding

dev-null-dweller
  • 29,274
  • 3
  • 65
  • 85
0

You wrote you only want to validate the link value with a regular expression:

$doesMatch = preg_match('~^http://www.example.com/embed/[^?]{0,11}\?rel=0$~', $link);

This does specifically what you're asking for.

For removing tags please see strip_tags or use a HTML parser to do it, which will also help you to get the link value more properly.

In a similar question/answer I posted some example code how to use strip_tags and SimpleXMLElement together: Extract all the text and img tags from HTML in PHP.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836