1

I'm in a position where I need to extract content from an existing site. The HTML is brutal but so far I've been able to pull the existing content into tables except for this bit of text.

I've scoured around here with no avail. Here's what a bit of the markup looks like:

<div id="content">
  <div class="comments">
    My comment<br />
    Name <br /> 
    Mytown, NY USA
  </div>
  - Wednesday, December 07, 2005 at 07:20:47 (EST)
  <hr />
  <div class="comments">
      My Comment 2<br />
      2nd Person's name <br />
      My Town, USA
  </div>
  - Wednesday, November 02, 2005 at 18:48:38 (EST)
  <hr />
</div>

I have to parse through tons of entries like these. I have all the other ones, but how do I target the text in each instance that's immediately outside of the </div> And complete when it hits that <hr />?

mplungjan
  • 169,008
  • 28
  • 173
  • 236
Mike S
  • 101
  • 11

2 Answers2

1

To achieve this you need to retrieve the text node sibling of the div. You can use the nextSibling method of the div DOMElement to achieve this:

$('.comments').each(function() {
    var text = $(this)[0].nextSibling.nodeValue.trim();
    // work with the value here...
    console.log(text);
});

Working example

Alternatively, you could create an array of all the text values and work with them how you require later in your logic:

var dates = $('.comments').map(function() {
    return $(this)[0].nextSibling.nodeValue.trim();
}).get();

// use 'dates' variable as required...
console.log(dates);

Working example

Rory McCrossan
  • 331,213
  • 40
  • 305
  • 339
0

How about removing the divs from the html?

var cloned = $("#content").clone();
cloned.find("div").remove();
var strings = $.map(cloned.text().split("-"), $.trim);
strings.shift(); // remove the newlines from before the first date
console.log(strings);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="content">
  <div class="comments">
    My comment
    <br />Name
    <br />Mytown, NY USA
  </div>
  - Wednesday, December 07, 2005 at 07:20:47 (EST)
  <hr />
  <div class="comments">
    My Comment 2
    <br />2nd Person's name
    <br />My Town, USA
  </div>
  - Wednesday, November 02, 2005 at 18:48:38 (EST)
  <hr />
</div>
mplungjan
  • 169,008
  • 28
  • 173
  • 236