How to obtain a value of in remote HTML codes using PHP?

Question

In a remote site, there is a HTML file (say http://www.example.com/abc.html), which reads:

<input id="ID1" name="NAME1" value="VALUE1">

In my PHP code in my server, I need "VALUE1" from http://www.example.com/abc.html. How can I do it using PHP?

Since the remote html is written in XHTML 1.0, I guess I could use an XML parser?

ADDED

Using xml_parse_into_struct, I obtained an array that contains:

[15] => Array
    (
        [tag] => INPUT
        [type] => complete
        [level] => 4
        [attributes] => Array
            (
                [TYPE] => hidden
                [NAME] => NAME1
                [ID] => ID1
                [VALUE] => VALUE1
            )

    )

How can I obtain "VALUE1"? I guess now this is more a question for handling arrays in PHP. I always know the name "NAME1", but I don't know the value "VALUE1". So I want to obtain "VALUE1" using "NAME1" which is the information I know.

> Since the remote html is written in XHTML 1.0, I guess I could use an > XML parser? Yep. — Ryan Kinal, Aug 01 '11 at 19:32

score 1 · Accepted Answer · answered Aug 01 '11 at 19:39

1

Why not just using a simple regex?

$html = '<input id="ID1" name="NAME1" value="VALUE1">';

if (preg_match('/name="NAME1".+value="(.*?)"/i', $html, $matches)) {
   echo $matches[1];  // should echo VALUE1;
}

The only constraint is that name must appear before value in the HTML element.

answered Aug 01 '11 at 19:39

Yanick Rochon

51,409
25
133
214

You know, there was a man who used to answer these kinds of questions. I haven't seen him in a while, so I'll take up his cause: regexes shouldn't ever be used to parse XML/HTML. – Chris Eberle Aug 01 '11 at 19:42
I'm just saying that if the `value` of that input element is only what he needs, there is no need to parse the entire XML document when a regular expression is faster and takes less memory. That is all. – Yanick Rochon Aug 01 '11 at 19:43
Ah here's his [response](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Chris Eberle Aug 01 '11 at 19:45
Agree w/Yanick - if the use is simple, parsing the document can be an overkill. It is also prone to failure if the target HTML is a mess. – Chris Baker Aug 01 '11 at 19:45
@Chris (the one who posted the link) - I appreciate the valid point he raises, but the response below is far more reasonable (and mature) - for a limited and specific set HTML, regex is just fine. – Chris Baker Aug 01 '11 at 19:48
However, if the target HTML is a mess then regex is likely unreliable anyway. Parsers can compensate for bad HTML. – Ryan Kinal Aug 01 '11 at 19:49
The HTML-regex bashing article is fine and I agree with it. However this does not apply here. The point is not to *parse* the XML structure, but simply to extract the value of an element alone. Consuming CPU time and memory to parse a few hundreds (or thousands) elements is indeed overkill when a simple pattern matching can achieve the same thing more efficiently. – Yanick Rochon Aug 01 '11 at 19:52
OK, well if the XML's structure doesn't change at all (in this case it looks simple enough), I guess I can get on board with that. – Chris Eberle Aug 01 '11 at 19:53
I quote the article you reference: "It's considered good form to demand that regular expressions be considered verboten, totally off limits for processing HTML, but I think that's just as wrongheaded as demanding every trivial HTML processing task be handled by a full-blown parsing engine. It's more important to understand the tools, and their strengths and weaknesses, than it is to knuckle under to knee-jerk dogmatism. " – Chris Baker Aug 01 '11 at 19:59
1

And the war rages on. There are standard reactions on both sides, and which side is right depends entirely upon the current question. If this question requires a *generic* solution (getting `value` from any given `input`), then parsing is the better answer. If it requires no more than a simple string-search, then regex is the better answer. – Ryan Kinal Aug 01 '11 at 20:06

score 1 · Answer 2 · answered Aug 01 '11 at 19:43

Its all going to depend on how you will be fetching your entire array. But taking the example above you can get the value by $array[15]['attributes']['VALUE'] Where the variable $array is the variable used to render your xml_parse_into_struct output to. But if you want it dynamic I suggest something a little more smarter as I think the key index 15 will change if more elements are added to the page.

$array = xml_parse_into_struct($string);
foreach ($array as $key => $value) {
  if($value['attributes']['NAME'] == 'NAME1') {
    $input_value = $value['attributes']['VALUE'];
    break; // unless you need to do more here just break out.
  }
}

print $input_value;

score 0 · Answer 3 · answered Aug 01 '11 at 19:44

If you know the name of the element and are truly only after one little thing and the format of the page is always the same, it might be less work to just use curl and explode to parse the document with string compares. This is a quick-and-dirty way to do it, but as long as those two conditions are met this is arguably the fastest way:

$url = 'http://example.com/';
$options = array(
CURLOPT_RETURNTRANSFER => true,     // return web page
CURLOPT_HEADER         => false,    // don't return headers
CURLOPT_FOLLOWLOCATION => true,     // follow redirects
CURLOPT_ENCODING       => "",       // handle all encodings
CURLOPT_USERAGENT      => "spider", // who am i
CURLOPT_AUTOREFERER    => true,     // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
CURLOPT_TIMEOUT        => 120,      // timeout on response
CURLOPT_MAXREDIRS      => 10       // stop after 10 redirects
);

$ch      = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err     = curl_errno( $ch );
$errmsg  = curl_error( $ch );
$header  = curl_getinfo( $ch );
curl_close( $ch );
$parts = explode('<input id="ID1" name="NAME1" value="', $content);
if (count($parts) == 2) {
    $value = explode('">', $parts[1]);
    $value = $value[0];
} else {
    $value = false;
}

print 'Value is: ' . $value;

Otherwise, you could use regex (again using curl as above):

preg_match('/name="NAME1".+value="(.*?)"/i', $html, $value);
$value = $value[1];

Finally, if you want to go all-out on this one, you can use a document parser. Be warned, however, that if the HTML you are working with is not properly formed, the parser will have trouble. Here's a tutorial on the subject, using a third-party class: http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

score -3 · Answer 4 · answered Aug 01 '11 at 19:44

-3

If you need to pass variable from html page to PHP code, use forms ( http://www.w3.org/TR/html4/interact/forms.html ) in html and $_POST ( http://www.php.net/manual/en/reserved.variables.post.php ) or $_GET ( http://www.php.net/manual/en/reserved.variables.get.php ) variables in PHP. If you are not familiar to arrays in PHP, take a look at this: http://www.php.net/manual/en/language.types.array.php

answered Aug 01 '11 at 19:44

Timur

6,668
1
28
37

This does not appear to have a single thing to do with the question. – Chris Baker Aug 01 '11 at 19:49
Hmm... Where is something about XML parsing in my answer? Are you ok? – Timur Aug 01 '11 at 20:06
1

That's the point - there is nothing in your answer that is related to the question. OP is looking for a way to parse a value out of a document, not **pass** a value. It is the LACK of parsing in this answer that is the problem. We're just fine... you okay? – Chris Baker Aug 01 '11 at 20:33

How to obtain a value of in remote HTML codes using PHP?

4 Answers4