0

I'm trying to get the content of a certain div from a page and store it in my db. I did the following:

$html = file_get_contents($url);
$dom = new SmartDOMDocument();
$dom->loadHTML($html);    
$div_tags = $dom->getElementsByTagName('div');
foreach ($div_tags as $element) {
    if(strpos($element->getAttribute('itemprop'), 'description') !== false)
        $description = $element->nodeValue;
}

I used SmartDOMDocument because it handles UTF-8 better than DOMDocument.

Now, this will give me the text of the element without the tags. I tried this solution and it did give me the text with the tags. However, when I tried to store it in my db, I couldn't!

Is there a better way to get the utf-8 text with the tags from the element and store it properly in a db?

EDIT: the insert statement is pretty simple:

$q = "INSERT INTO `MyTable`.`content` (`description`) VALUES ('$description')";
$r=mysql_query($q); 
var_dump($r);
Community
  • 1
  • 1
iTurki
  • 16,292
  • 20
  • 87
  • 132
  • 1
    `when I tried to store it in my db, I couldn't!` why? any errors? what happens? – Prix Aug 03 '13 at 21:02
  • I tried `var_dump()` the result and it gave me `bool(false)`. Nothing else. – iTurki Aug 03 '13 at 21:03
  • Well looking at your code, you have a foreach, a if and the element do you want to catch multiple items or just one ? if that is the case make a break after the if to leave the foeach as it may be hitting another element that is empty perhaps. – Prix Aug 03 '13 at 21:05
  • You are right. But I know I got the right item. `$description` is returning the expected result. – iTurki Aug 03 '13 at 21:09
  • Could you add code that is responsible for DB-operations? – Dmitrii Tarasov Aug 03 '13 at 21:09
  • See the edit. It is a simple insert. And it works when I do `$description = $element->nodeValue;` – iTurki Aug 03 '13 at 21:16
  • @iturki use `mysql_real_escape_string($element)`, also using DomDocument instead of smart, does it return the HTML you wanted ? if so it may be easier for you to just use mb and convert anything needed back to utf8 – Prix Aug 03 '13 at 21:20
  • `mysql_real_escape_string()` takes a string as a parameter, not a DOMElement! – iTurki Aug 03 '13 at 21:23
  • @iturki not after you take the nodeValue as it turns into a string, have you tried getting the element nodevalue using DomDocument alone ? – Prix Aug 03 '13 at 22:39
  • Yes. I tried both. Unfortunately, both returned the string _without_ tags. – iTurki Aug 03 '13 at 23:11
  • @iturki have you tried using `parentNode` `$element->parentNode->nodeValue` – Prix Aug 03 '13 at 23:16
  • Although this will return unwanted data, still `nodeValue` returned the text only. – iTurki Aug 04 '13 at 00:37

2 Answers2

0

Try var_dump-ing the $element to see if it has other properties than nodeValue. There should also be something as HTMLvalue, getHTML or other similar property.

svecon
  • 554
  • 1
  • 5
  • 8
0

Try to use textContent instead of nodeValue. And do not forget about escaping (I assume use mysql_real_escape_string in this case)

Dmitrii Tarasov
  • 414
  • 2
  • 13