fetch any user input html tag from html document live url using php regular expression

Question

I want to fetch any meta, title, script, link tag that is available on HTML page, this is the program i write (not correct but will give idea for experts).

<?php
function get_tag($tag_name, $url)
{
    $content = file_get_contents($url);

    // this is not correct : regular expression please //
    preg_match_all($tag_name, $content, $matches);

    return $matches;
}

print_r(get_tag('title', 'http://stackoverflow.com'));

?>

Output should come something like this :

Array
(
    [0] => title
    [1] => Stack Overflow
)

Thanks!!

it is not recommended to use regex to get dom element. Use [DOMDocument](http://www.php.net/manual/en/class.domdocument.php) — Ibu, Jun 24 '11 at 06:21

score 1 · Answer 1 · edited May 23 '17 at 11:55

1

Before using regex for parsing HTML, you want to read the first answer from this question.

Try with DOMDocument, like this:

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>

edited May 23 '17 at 11:55

Community

1
1

answered Jun 24 '11 at 06:37

Tudor Constantin

26,330
7
49
72

I just tested this code and it works like a charm on my machine, with no warnings. Anyway, you get a warning, not an error. Upvote & accept are welcomed :) – Tudor Constantin Jun 24 '11 at 06:52
This is another question - create another one. – Tudor Constantin Jun 24 '11 at 06:56
@Tudor, no it isn't another question, he asks how to get metas as well in the question. posting answer now – Liam Bailey Jun 24 '11 at 06:59
@Liam - in the original question, it is mentioned that he wants the content of some tags, now he also needs the values of some specific attributes - this can be seen both ways - as a new question or as further requirements on the first one - I personally like to keep things as simple as possible - so in my opinion 2 questions are best in this case – Tudor Constantin Jun 24 '11 at 07:13

Liam Bailey · Accepted Answer · 2011-06-24T14:29:45.000

function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}

This answer will actually give you the name of the tag as your first array value rather than "array" and will also stop the warning.

Of course, I have edited accordingly. Glad it is working for you. — Liam Bailey, Jun 24 '11 at 14:30

score 0 · Answer 3 · answered Jun 24 '11 at 06:56

0

Since these tags cannot be nested, parsing is not necessary.

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

If you are using this with your function, you will have to write $tag_name instead "meta|title|script|link".

answered Jun 24 '11 at 06:56

Leif

2,143
2
15
26

fetch any user input html tag from html document live url using php regular expression

3 Answers3