0

Possible Duplicate:
How to parse and process HTML with PHP?

I am trying to extract values from some html. Here is the the part of the HTML document I am trying to get values from.

    <input type="hidden" id="first"
        value='&euro;218.33' />
    <input type="hidden" id="second"
        value='&euro;291.08' />
    <input type="hidden" id="third"
        value='&euro;344.77' />

I have used the following preg match all command, where $buffer contains the entire html for the page I am searching.

if (preg_match_all('/<input type="hidden" id="(.+?)" value=\'&euro;(.+?)\'/', $buffer, $matches))
{
   echo "FOUND";
   echo  $matches[2][0] . " " . $matches[2][1] . " " . $matches[2][2] . "\n";
} 

This preg match command is not finding any matches. Any suggestions?

Community
  • 1
  • 1
user1197941
  • 179
  • 3
  • 16
  • 1
    [Simple HTML DOM](http://simplehtmldom.sourceforge.net/) is a very easy-to-use HTML paraser for PHP. – Adi Sep 04 '12 at 10:23
  • @Adnan SimpleHTMLDom is crapware IMO. Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). – Gordon Sep 04 '12 at 10:31

3 Answers3

2

A very simple solution is using PHP Simple HTML DOM Parser str_get_html

HTML Example

include "simple_html_dom.php" ;

$html =" <input type=\"hidden\" id=\"first\"
    value='&euro;218.33' />
<input type=\"hidden\" id=\"second\"
    value='&euro;291.08' />
<input type=\"hidden\" id=\"third\"
    value='&euro;344.77' />";

Usage

$html = str_get_html($html);
foreach($html->find('input') as $element)
    echo $element->value . '\n';

Output

€218.33
€291.08
€344.77
Baba
  • 94,024
  • 28
  • 166
  • 217
  • @Gordon Click on str_get_html it would take you to PHP Simple HTML DOM Parser : http://simplehtmldom.sourceforge.net/ – Baba Sep 04 '12 at 10:25
1

This regexp is not returning anything because there is more than one space between the id and the value...

preg_match_all('/<input type="hidden" id="(.+?)"[.\s\t\r\n\v\f]*?value=\'&euro;(.+?)\'/', $buffer, $matches)

note the [.\s\t\r\n\v\f]*? just before value=. This will take any characters after the closing " of the id and before the value=". This way spaces, tabs, linebreaks and others will not break your expression.

Salketer
  • 14,263
  • 2
  • 30
  • 58
0

What about

if (preg_match_all('/<input type="hidden" id="(.+?)".+?value=\'&euro;(.+?)\'/s', $buffer, $matches))
Teno
  • 2,582
  • 4
  • 35
  • 57
  • Different platforms don't treat . as a newline character. It is safer to specify newline too. – Salketer Sep 04 '12 at 10:35
  • 1
    `s (PCRE_DOTALL) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.` says the manual. – Teno Sep 04 '12 at 11:10