Html - Extract information

Question

I need to extract some information from html code, I have these two structures:

<p>Street 1a</p>
<p>12345 Berlin</p>

and

<p>
Street 1a
<br>
12345 Berlin
</p>

My question is how to extract the string 'Street 1a' from both structures with one method.

I thought about writing a method for every possible html-sturcure, but this is far to much work. I also thought about parsing the whole html-code and do pattern matching but is also not very elegant, like:

$xml = new DOMDocument();
libxml_use_internal_errors(true);

// Load the url's contents into the DOM
$xml->loadHTMLFile($url);
libxml_clear_errors();

// pattern matching now

Anybody has some experience with this?

Greetings and thanks!

possible duplicate of [DOMDocument for parsing HTML (instead of regex)](http://stackoverflow.com/questions/7324620/domdocument-for-parsing-html-instead-of-regex) — ThW, Dec 03 '14 at 13:29

score -1 · Answer 1 · answered Dec 03 '14 at 13:04

-1

<div id="extract">
    <p>Street 1a</p>
    <p>12345 Berlin</p>
</div>

Your Script should like this

$(document).ready(function() {
    $('#extract p').each(function() {
    console.log($(this).text());
}); 
});

answered Dec 03 '14 at 13:04

Gold Pearl

1,922
3
17
28

Html - Extract information

1 Answers1