Regex to get html value based on unique class

Question

I'm using regex to parse a website in perl. The content of the site look like this:

(much text) </div> <div class="euroPrice"> <span>1.23</span> 
(much text) </div> <div class="euroPrice"> <span>2.34</span> (much text)

with (much text) beeing standard html. I would like to get the numbers 1.23 and 2.34, but I have to use regex. Any hints?

I tried something like this:

class="euroPrice"> <span>([\d\.]+)

But that only gave me the first one.

Ugh. Just... [don't](http://stackoverflow.com/a/1732454/82262). — Matt Jacob, Mar 08 '16 at 05:38

score 1 · Accepted Answer · answered Mar 08 '16 at 06:21

1

Better get your spans first (via an xpath) and get the span numbers with @Tims's regex. An xpath to get your spans would be:

("//div[@class='euroPrice']/span")

answered Mar 08 '16 at 06:21

Jan

42,290
8
54
79

Whilst I agree, I'm not sure which modules I'd be XPathify HTML. (My XML parsers tend to get upset by some of the HTMLisms) – Sobrique Mar 08 '16 at 10:39

score 0 · Answer 2 · answered Mar 08 '16 at 05:32

You can iterate over the text for the website and apply the following code for each line:

$line = "(much text) </div> <div class="euroPrice"> <span>1.23</span>";
if ($line =~ /<div class="euroPrice"> <span>(\d+\.\d+)<\/span>/) {
    print "first a number: $1 in current line\n";
}

This solution assumes that there will be at most one match per line.

You can explore the regular expression here:

Regex101

Regex to get html value based on unique class

2 Answers2