1

I need to get the "274.20p" out of:

<td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">274.20p</td>

I would like to do regular expressions on:

<td    class="dataRegularUlOn"    >

so something like:

/<td(.*?)class="dataRegularUlOn"(.*?)>/

I'm using ruby, on linux.

thks

Steven
  • 1,963
  • 2
  • 11
  • 4
  • A "ruby html parser" might be more adapted to this task: http://ruby-toolbox.com/categories/html_parsing.html – VonC May 16 '10 at 10:44
  • Obligatory Cthulhu link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm May 16 '10 at 23:00

4 Answers4

1

Why do you want to write your own HTML parser, when there's plenty of perfectly capable HTML parsers already out there?

require 'nokogiri'

doc = Nokogiri::HTML('
    <td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">
        274.20p
    </td>')

p doc.search('.dataRegularUlOn').map(&:text)
# => ["272.20p"]
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • This is the perfect method, but at uni, i can't install gems... :( – Steven May 19 '10 at 17:02
  • @Steven: Really? Not even inside your home directory? You can set the environment variables `GEM_HOME` and `GEM_PATH` to point somewhere inside your `$HOME` directory. In fact, if you call `gem install` and it detects that it can't write to the system directory, it should actually automatically fall back to the home directory. Anyway, there is a lenient XML library that can also parse many HTML documents in the stdlib that doesn't require a third-party library: `REXML` (`require 'rexml'`). – Jörg W Mittag May 19 '10 at 17:56
  • @Steven: Also, you don't actually *have* to install gems. You can also just install the files yourself somewhere in your `$HOME` directory and add that directory to Ruby's `$LOAD_PATH`. – Jörg W Mittag May 19 '10 at 17:58
0

Why not use something like http://github.com/whymirror/hpricot instead and then you can just use the xpath to the element to retrieve the value.

Jamie
  • 2,245
  • 4
  • 19
  • 24
0

Are you parsing an html file? I think you should use XPath, really easy to use. For Ruby there is Nokogiri.

Using regexp, I would do like this:

ruby_sub_string = /.*[\d]+\.[\d]{1,2}p(.*)/.match(my_string)
ruby_sub_string[1]

It should do the trick. I can't try it rigth now though.

dierre
  • 7,140
  • 12
  • 75
  • 120
0

Try this regular expression:

/<td[^>]*class="dataRegularUlOn"[^>]*>([^<]*)<\/td>/
rlandster
  • 7,294
  • 14
  • 58
  • 96