How to figure out this GREP regex pattern?

Question

Somewhere in a large html file :

<td headers="fee" style="cursor:pointer;" onclick="toggle('detailinfo088180');">
            $675.00 
        </td>

blabla<br><em>$650</em>">blabla/a>
    </td>
  </tr>

I need to have only the '675.00' number after a grep command. I tried some regex like $[0..9].* but it doesn't work.

Thanks,

So, the fact that it's within HTML isn't especially relevant; you're just searching for a number that follows a dollar sign, right? — Wiseguy, Apr 05 '12 at 05:23
Dare I dupe it.... yes, yes I do. [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Amber, Apr 05 '12 at 05:24
Amber, trivial data extraction from HTML/XML is possible and feasible with regex. This question is not at all about trying to match the tag structure. To clarify again: What they are searching for is very much regular, thus your linked/duped question doesn't apply at all. — Joey, Apr 05 '12 at 06:05
Have amended my answer to hopefully answer all of your question. — Michael Slade, Apr 05 '12 at 06:43

dash1e · Answer 1 · 2012-04-05T07:29:18.973

1

Try this

grep -e "\$[0-9]\{1,\}\.[0-9]\{2\}"

I put "$" to match better the pattern, you can remove it after the grep matches, piping another operation.

If you need to extract only the number maybe you can decide to not using grep but perl:

perl -ne '/\$([0-9]+\.([0-9]+))/ && print "$1\n"' < yourfile

edited Apr 05 '12 at 07:29

answered Apr 05 '12 at 05:31

dash1e

7,677
1
30
35

I update the comment with suggestion of using perl instead of grep. – dash1e Apr 05 '12 at 07:30

score 1 · Accepted Answer · answered Apr 05 '12 at 05:31

1

You want to use a hyphen - not .. to signify a range. You also need to escape the $ literal because it otherwise means end-of-line.

This should see it: grep "\$[0-9]+"

answered Apr 05 '12 at 05:31

phatfingers

9,770
3
30
44

It works but the output is the entire line, I just want the number. – Bebeoix Apr 05 '12 at 05:41
Grep by itself won't give you what you want then... it returns whole lines based on a match. @Wiseguy -- I didn't forget the decimal... just didn't see the need to be that specific. – phatfingers Apr 05 '12 at 05:50
1

You might try something like `grep "\$[0-9]+" | sed "s/[^$]*$\$[0-9.-]$.*/\1/"` – phatfingers Apr 05 '12 at 05:59

Michael Slade · Answer 3 · 2012-04-05T06:43:03.057

This would work to extract the number from the inner HTML of that '':

/[0-9.]+/

the other part of the problem is getting the HTML with the price in it. Here is a more complete example:

<html>
<head>
<script>
    function toggle(e,id) { 
        val = parseFloat(e.innerHTML.match(/[0-9.]+/));
        // Another method:
        // val = parseFloat(e.innerHTML.match(/\$([0-9.]+)/)[1]);
        alert(val);
    }
</script>
</head>
<body>

<table border=1><tr>
<td headers="fee" style="cursor:pointer;" onclick="toggle(this,'detailinfo088180');">
   $675.04 
</td>

blabla<br><em>$650</em>">blabla/a>
    </td>
  </tr>

</table>
</body>
</html>

Note the following:

The toggle() function takes an extra parameter, which is the element that was actually clicked. (Assuming you want the price to be extracted from the clicked element)
I have provided another regular expression that is more restrictive (must have a "$" at the front of the number) in case this is what you need. The expression makes use of capturing ("(..)") to match a string and extract a portion of the string instead of the entire string.

If you want to know more about how regular expressions work, try here. Or Google.

I think you want /\$[0-9.]+/ because it seems the dollar sign in this question is significant. — jwir3, Apr 05 '12 at 05:28

How to figure out this GREP regex pattern?

3 Answers3