3

I've got an HTML string that might look something like this:

<body>
  <div>
    <span class="blah">Monkey </span>
    <p>breath really <b>stinks</b></p>
    And I don't like it!
  </div>
</body>

As you can see, there's some text contained properly as a value inside of an element, there's elements that contain text nodes and other elements. I would like to be able to get all the text values under body (assume body is a DOMElement that I have stored in a variable).

So, the output would look something like:

Monkey breat really stinks And I don't like it!

How would I do this? XPath? Regexps? Magic?

hakre
  • 193,403
  • 52
  • 435
  • 836
jwegner
  • 7,043
  • 8
  • 34
  • 56
  • Try magic, when it feels, you use javascript. – Chibuzo Apr 06 '12 at 02:01
  • @jwegner - why do you want do this ? what's the use case? – Flukey Apr 06 '12 at 02:03
  • 1
    @Flukey Similar to "link density" as discussed [here](http://stackoverflow.com/questions/3652657/what-algorithm-does-readability-use-for-extracting-text-from-urls), I would like to calculate the density for an HTML form – jwegner Apr 06 '12 at 12:08
  • @Chibuzo, I've been trying magic, but I keep getting syntax errors. Also, can't use javascript because the HTML is loaded into PHP via cURL. – jwegner Apr 06 '12 at 12:09

1 Answers1

0

If you don't mind using jquery, I might have an answer for this.

First we need to crawl the content.So use php curl for that and echo the content.After you get the content in the body trigger a jquery function which has the following line,

supposing all the text is contained in a div with id content

$('#content').text() gives you the required output.

Remember to use jquery delegate to bind the function to whatever the event you choose.

Krishna Deepak
  • 1,735
  • 2
  • 20
  • 31
  • This is inside of PHP, and the HTML is gather via cURL. Unfortunately jQuery is not an option. – jwegner Apr 06 '12 at 12:04
  • I am using the same thing daily. I will just edit this answer to give you a complete picture – Krishna Deepak Apr 07 '12 at 14:43
  • 1
    No, really, this can't be done in the frontend. There literally _is no_ frontend. Think of this as being a sort of API function - something that runs entirely on the server, and the parsed result is passed to the user via JSON. – jwegner Apr 09 '12 at 12:12