Ruby Regular Expressions - Checking the start middle and end of a line?

Question

I need to get the "274.20p" out of:

<td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">274.20p</td>

I would like to do regular expressions on:

<td    class="dataRegularUlOn"    >

so something like:

/<td(.*?)class="dataRegularUlOn"(.*?)>/

I'm using ruby, on linux.

thks

A "ruby html parser" might be more adapted to this task: http://ruby-toolbox.com/categories/html_parsing.html — VonC, May 16 '10 at 10:44
Obligatory Cthulhu link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Andrew Grimm, May 16 '10 at 23:00

score 1 · Answer 1 · answered May 16 '10 at 12:36

1

Why do you want to write your own HTML parser, when there's plenty of perfectly capable HTML parsers already out there?

require 'nokogiri'

doc = Nokogiri::HTML('
    <td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">
        274.20p
    </td>')

p doc.search('.dataRegularUlOn').map(&:text)
# => ["272.20p"]

answered May 16 '10 at 12:36

Jörg W Mittag

363,080
75
446
653

This is the perfect method, but at uni, i can't install gems... :( – Steven May 19 '10 at 17:02
@Steven: Really? Not even inside your home directory? You can set the environment variables `GEM_HOME` and `GEM_PATH` to point somewhere inside your `$HOME` directory. In fact, if you call `gem install` and it detects that it can't write to the system directory, it should actually automatically fall back to the home directory. Anyway, there is a lenient XML library that can also parse many HTML documents in the stdlib that doesn't require a third-party library: `REXML` (`require 'rexml'`). – Jörg W Mittag May 19 '10 at 17:56
@Steven: Also, you don't actually *have* to install gems. You can also just install the files yourself somewhere in your `$HOME` directory and add that directory to Ruby's `$LOAD_PATH`. – Jörg W Mittag May 19 '10 at 17:58

score 0 · Answer 2 · answered May 16 '10 at 10:47

0

Why not use something like http://github.com/whymirror/hpricot instead and then you can just use the xpath to the element to retrieve the value.

answered May 16 '10 at 10:47

Jamie

2,245
4
19
24

Again, same problem as above :p can not use gems atm ;) they don;t install – Steven May 19 '10 at 17:02

score 0 · Answer 3 · answered May 16 '10 at 10:49

Are you parsing an html file? I think you should use XPath, really easy to use. For Ruby there is Nokogiri.

Using regexp, I would do like this:

ruby_sub_string = /.*[\d]+\.[\d]{1,2}p(.*)/.match(my_string)
ruby_sub_string[1]

It should do the trick. I can't try it rigth now though.

score 0 · Accepted Answer · answered May 16 '10 at 16:16

0

Try this regular expression:

/<td[^>]*class="dataRegularUlOn"[^>]*>([^<]*)<\/td>/

answered May 16 '10 at 16:16

rlandster

7,294
14
58
96

this is better than spliting on the end bit. thks alot – Steven May 19 '10 at 17:03

Ruby Regular Expressions - Checking the start middle and end of a line?

4 Answers4