-4

I am building a php data miner (scraper) I have this html line:

<label class='area'>
  <font class='bg_info' onmouseover="land_convert_txt(this,3067)" onmouseout='tooltip_hide()'>
   3,067 Sq. Ft.
  </font>

how to setup my regex to extract the area value only?

this is my function:

function extract_regex($subject, $regex, $index = 1)
{
    preg_match_all($regex, $subject, $matches);
    if (count($matches[$index]))
    {
        if (count($matches[$index]) == 1)
        {
            return trim($matches[$index][0]);
        }
        return $matches[$index];        
    }
    return '';
}

(this,3067) keep changing!

Thank you in advanced

Hassan Kazem
  • 81
  • 1
  • 6

2 Answers2

1

Don't use Regex to handle HTML!
Don't try to re-invent the wheel, you will probably create a square.

Try using some PHP web scrappers, like:

http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

Use code like so:

# create and load the HTML
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load($myHTML);

# get an element representing the area element
//$element =  $html->find('label[class=area]'); 
$element = $html->find(".area")

# Echo it out
echo $element[1]->innertext
funerr
  • 7,212
  • 14
  • 81
  • 129
  • thank you very much, but i build all my lib using regex and it will cost lost of time to fix it, the area part is only what i need in reg ex – Hassan Kazem Jun 30 '13 at 11:23
  • 2
    The time you'll save on maintenance is worth the time you put in to do it right in the first place. – Herbert Jun 30 '13 at 12:50
0
 function extract_regex($subject, $regex, $index = 1)
    {
        preg_match_all($regex, $subject, $matches);
        if (count($matches[$index]))
        {
            if (count($matches[$index]) == 1)
            {
                return trim($matches[$index][0]);
            }
            return $matches[$index];        
        }
        return '';
    }

    $out = extract_regex("<label class='area'><font class='bg_info' onmouseover='land_convert_txt(this,3067)' onmouseout='tooltip_hide()'>3,067 Sq. Ft.</font></label>","/<label class=\'area\'>(.*)<\/label>/i");

        echo "<xmp>". $out . "</xmp>";
Ahmed Atta
  • 363
  • 1
  • 7