0

I have run into a regex in Perl that seems to be giving me problems. I'm fairly new to Perl - but I don't think that's my problem.

Here is the code:

if ($line =~ m/<amount>(\d*\.\d{2})<\//) { $amount = $1; }

I'm essentially parsing an XML formatted file for a single tag. Here is the specific value that I'm trying to parse.

<amount>23.00000</amount>

Can someone please explain why my regex won't work?

EDIT: I should mention I'm trying to import the amount as a currency value. The trailing 3 decimals are useless.

Rico
  • 5,692
  • 8
  • 46
  • 63

3 Answers3

5

You shouldn't use regex for parsing HTML, but regardless this will fix it:

if ($line =~ m|<amount>(\d*\.\d{2})\d*<//)| { $amount = $1; }
Community
  • 1
  • 1
Nick Garvey
  • 2,980
  • 24
  • 31
5

The \d*\.\d{2} regex fragment only recognize a number with exactly two decimal places. Your sample has five decimal place, and thus does not match this fragment.

You want to use \d*\.\d+ if you need to have a least one decimal place, or \d*\.\d{2,5} if you can have between 2 and 5 decimal place.

And you should not use back-tick characters in your regex as they have no meaning in a regex, and thus are interpreted as regular character.

So you want to use:

if ($line =~ m/<amount>(\d*\.\d{2,5})<\/amount>/) { $amount = $1; }
Sylvain Defresne
  • 42,429
  • 12
  • 75
  • 85
0

In a regex pattern, the sequence "{2}" means match exactly two instances of the preceding pattern.

So \d{2} will only match two digits, whereas your input text had five digits at that point.

If you don't want the trailing digits, then you can discard them using \d* outside the capture-parentheses.

Also, if your pattern contains slashes, consider using a different delimiter to avoid having to escape the slashes, e.g.

if ($line =~ m{<amount>(\d*\.\d{2})\d*</}) { $amount = $1; }

Also, if you want to parse XML, then you may want to consider using an XML library such as XML::LibXML.

zgpmax
  • 2,777
  • 15
  • 22