PHP, how do i use the preg_match?

Question

Possible Duplicate:
Best methods to parse HTML with PHP

I have data which contains a lot of times:

<td width="183">//I want to find what's here</td>

This td is for each item in this site, how do I get the content of each td?

Related: [Best methods to parse HTML with PHP](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php) — Wesley Murch, Jul 25 '11 at 16:39

cwallenpoole · Answer 1 · 2011-07-25T16:47:27.553

3

You're generally best off using DOMDocument for all HTML/XML parsing:

$doc = new DomDocument();
$doc->loadHTML( '<html>...</html>' );
foreach( $dom->getElementsByTagName( 'td' ) as $node )
{
    echo $node->nodeValue;
}

To get one TD with width="183", then you can use DomXPath

$xpath = new DOMXpath($dom);

$elements = $xpath->query("*/td[@width='183']");

foreach( $elements as $node )
{
    echo $node->nodeValue;
}

edited Jul 25 '11 at 16:47

answered Jul 25 '11 at 16:41

cwallenpoole

79,954
26
128
166

But there are a lot of 'td' i want the one with the 'width' of '183' exactly – user850019 Jul 25 '11 at 16:45
Can't you give me a way using the 'preg_match', cause there is another thing that i will use the 'preg_match' with, so if you bring me a code it will be better for me to learn through this. – user850019 Jul 25 '11 at 16:48
1

@user You're better off and will learn more by asking another question. It is __bad practice__ to use regex for HTML if it can be avoided *at all*. – cwallenpoole Jul 25 '11 at 16:51

score 1 · Answer 2 · answered Jul 25 '11 at 16:47

Well, better not with preg_match... Better with:

php > $xml = new SimpleXmlElement('<root><td width="183">A</td><td width="182">B</td><td width="181">C</td></root>');
php > foreach($xml->xpath('//td[@width=183]') as $td) echo (string)$td,"\n";
A

or similar.

if you absolutely have to...:

php > preg_match_all('/<td width="183">(.*?)<\\/td>/', '<root><td width="183">A</td><td width="182">B</td><td width="181">C</td></root>', $matches);
php > var_dump($matches);
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(22) "<td width="183">A</td>"
  }
  [1]=>
  array(1) {
    [0]=>
    string(1) "A"
  }
}

Anyway... I told you, that the regex approach is easily broken and not recommended.

EDIT: I fixed the "only 183"-part which was not clear to me from the beginning.

unlike DOM that has a loadHTML method, SimpleXML will fail when it's not valid XHTML — Gordon, Jul 25 '11 at 16:50

score 1 · Answer 3 · answered Jul 25 '11 at 16:48

Use preg_match_all() and Check this example out:

<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
    echo "matched: " . $val[0] . "\n";
    echo "part 1: " . $val[1] . "\n";
    echo "part 2: " . $val[2] . "\n";
    echo "part 3: " . $val[3] . "\n";
    echo "part 4: " . $val[4] . "\n\n";
}
?>

The above example will output:

matched: bold text
part 1: <b>
part 2: b
part 3: bold text
part 4: </b>

matched: click me
part 1: <a href=howdy.html>
part 2: a
part 3: click me
part 4: </a>

As you can you can $echo $val[3] to get what is inside the html tags. I got the example from this link.

http://www.php.net/manual/en/function.preg-match-all.php

PHP, how do i use the preg_match?

3 Answers3