0

I'm new in regular expressions. I have a string as code bellow. I want to get the text after all the div tags end.

<div class="bbcode_container">
  <div class="bbcode_quote">
    <div class="quote_container">
      <div class="bbcode_quote_container">
      </div>
      <div class="bbcode_postedby">
        <img border="0" src="http://www.webketoan.vn/forum/images/misc/quote_icon.png" alt="Click here to enlarge" onclick="window.open(this.src)" style="max-width: 700px; cursor: pointer;" title="Click here to enlarge"> Nguyên văn bởi <strong>namphong13</strong>
        <a rel="nofollow" href="http://www.webketoan.vn/forum/f94/ket-qua-thi-cong-chuc-thue-126218-post842693.html#post842693"><img border="0" src="http://www.webketoan.vn/forum/images/buttons/viewpost-right.png" class="inlineimg" alt="Click here to enlarge" onclick="window.open(this.src)" style="max-width: 700px; cursor: pointer;" title="Click here to enlarge"></a>
      </div>
      <div class="message">Can you help me?<br>
      </div>
    </div>
  </div>
</div>

How can I do it?

ekremkaraca
  • 1,453
  • 2
  • 18
  • 37
khanh
  • 4,516
  • 10
  • 29
  • 48

3 Answers3

2
  • You want to see if there is the text

Thanks for support

in your page?

Then your regex would look like:

match = html_string[/.+Thanks for support/]

If the match variable is not nil, then you have that text in your html_string variable

  • If you want to catch all the text after the last closed div, then you could:

    html_string =~ /.*\<\/div\>\n([a-zA-Z\s]*)$/

    puts $1

Tudor Constantin
  • 26,330
  • 7
  • 49
  • 72
1

You should use an HTML parser like Nokogiri for this.

page = Nokogiri::HTML(my_file)
# remove all the div tags
page.search('div').remove
string = page.text
David
  • 7,310
  • 6
  • 41
  • 63
  • yea. thanks. I only want to get the text after all the div tags end. I don't want get all text. – khanh Jun 18 '11 at 06:09
  • `page.search('div').remove` removes all the divs, leaving just the text you want. – David Jun 18 '11 at 06:11
  • with your code it will show: Can you help me? Thanks for support. But I only want show: Thanks for support – khanh Jun 18 '11 at 06:27
  • That's incorrect. "Can you help me?" is inside a div, so it will be removed as well. – David Jun 26 '11 at 06:30
1

Use the code below to remove every character that occurs before the (case-insensitive) string "</div>":

input = 'a</div>b</DIV>c'
output = input.gsub(/.*<\/div>/i,'')    # => "c"
David Grayson
  • 84,103
  • 24
  • 152
  • 189