0

I have a code like this

<div class="rgz">
  <div class="xyz">
  </div>
  <div class="ckh">
  </div>
</div>

The class ckh wont appear everytime. Can someone suggest the regex to get the data of fiv rgz. Data inside ckh is not needed but the div wont appear always. Thanks in advance

Anish Joseph
  • 1,026
  • 3
  • 10
  • 24

2 Answers2

1

Regex is probably not your best option here.

A javascript framework such as jquery will allow you to use CSS selectors to get to the element your require, by doing something like

$('.rgz').children().last().innerHTML
David
  • 8,340
  • 7
  • 49
  • 71
  • actualy i am crawling data from a site using curl. There is nothing to do with jquery or javascript – Anish Joseph May 08 '11 at 19:47
  • Still I dont think regex is the best option. it will bite you in the ass eventually when the data you are scraping changes.. you maybe should look at a php dom parser... a quick google found me this..http://simplehtmldom.sourceforge.net/ but I am not a PHP guy so I cant vouch for it – David May 08 '11 at 19:49
1

@diEcho and @Dve are correct, you should learn to use something like the native DOMdocument class rather than using regex. Your code will be easier to read and maintain, and will handle malformed HTML much better.

Here is some sample code which may or may not do what you want:

$contents = '';
$doc = new DOMDocument();
$doc->load($page_url);
$nodes = $doc->getElementsByTagName('div');
foreach ($nodes as $node)
{
   if($node->hasAttributes()){
      $attributes = $element->attributes;
      if(!is_null($attributes)){
         foreach ($attributes as $index=>$attr){
            if($attr->name == 'class' && $attr->value == 'rgz'){
               $contents .= $node->nodeValue;
            }
         }
      }
   } 
}
jisaacstone
  • 4,234
  • 2
  • 25
  • 39