Perl Regex (\d*\.\d{2})

Question

I have run into a regex in Perl that seems to be giving me problems. I'm fairly new to Perl - but I don't think that's my problem.

Here is the code:

if ($line =~ m/<amount>(\d*\.\d{2})<\//) { $amount = $1; }

I'm essentially parsing an XML formatted file for a single tag. Here is the specific value that I'm trying to parse.

<amount>23.00000</amount>

Can someone please explain why my regex won't work?

EDIT: I should mention I'm trying to import the amount as a currency value. The trailing 3 decimals are useless.

You are only matching TWO decimal places, where there are 5 in your text — Neverever, Jan 16 '12 at 22:21
Sorry, there were no back-ticks in the actual regex. For some reason the site was ignoring in my "code" so I put those in. — Rico, Jan 16 '12 at 22:38
@Rico It is because you were using blockquote instead of code sample. — TLP, Jan 16 '12 at 22:48

score 5 · Accepted Answer · edited May 23 '17 at 10:24

5

You shouldn't use regex for parsing HTML, but regardless this will fix it:

if ($line =~ m|<amount>(\d*\.\d{2})\d*<//)| { $amount = $1; }

edited May 23 '17 at 10:24

Community

1
1

answered Jan 16 '12 at 22:22

Nick Garvey

2,980
24
31

score 5 · Answer 2 · answered Jan 16 '12 at 22:29

The \d*\.\d{2} regex fragment only recognize a number with exactly two decimal places. Your sample has five decimal place, and thus does not match this fragment.

You want to use \d*\.\d+ if you need to have a least one decimal place, or \d*\.\d{2,5} if you can have between 2 and 5 decimal place.

And you should not use back-tick characters in your regex as they have no meaning in a regex, and thus are interpreted as regular character.

So you want to use:

if ($line =~ m/<amount>(\d*\.\d{2,5})<\/amount>/) { $amount = $1; }

score 0 · Answer 3 · answered Jan 21 '12 at 22:56

In a regex pattern, the sequence "{2}" means match exactly two instances of the preceding pattern.

So \d{2} will only match two digits, whereas your input text had five digits at that point.

If you don't want the trailing digits, then you can discard them using \d* outside the capture-parentheses.

Also, if your pattern contains slashes, consider using a different delimiter to avoid having to escape the slashes, e.g.

if ($line =~ m{<amount>(\d*\.\d{2})\d*</}) { $amount = $1; }

Also, if you want to parse XML, then you may want to consider using an XML library such as XML::LibXML.

Perl Regex (\d*\.\d{2})

3 Answers3