-1

I'm using regex to parse a website in perl. The content of the site look like this:

(much text) </div> <div class="euroPrice"> <span>1.23</span> 
(much text) </div> <div class="euroPrice"> <span>2.34</span> (much text)

with (much text) beeing standard html. I would like to get the numbers 1.23 and 2.34, but I have to use regex. Any hints?

I tried something like this:

class="euroPrice"> <span>([\d\.]+)

But that only gave me the first one.

2 Answers2

1

Better get your spans first (via an xpath) and get the span numbers with @Tims's regex. An xpath to get your spans would be:

("//div[@class='euroPrice']/span")
Jan
  • 42,290
  • 8
  • 54
  • 79
  • Whilst I agree, I'm not sure which modules I'd be XPathify HTML. (My XML parsers tend to get upset by some of the HTMLisms) – Sobrique Mar 08 '16 at 10:39
0

You can iterate over the text for the website and apply the following code for each line:

$line = "(much text) </div> <div class="euroPrice"> <span>1.23</span>";
if ($line =~ /<div class="euroPrice"> <span>(\d+\.\d+)<\/span>/) {
    print "first a number: $1 in current line\n";
}

This solution assumes that there will be at most one match per line.

You can explore the regular expression here:

Regex101

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360