0

Somewhere in a large html file :

<td headers="fee" style="cursor:pointer;" onclick="toggle('detailinfo088180');">
            $675.00 
        </td>

blabla<br><em>$650</em>">blabla/a>
    </td>
  </tr>

I need to have only the '675.00' number after a grep command. I tried some regex like $[0..9].* but it doesn't work.

Thanks,

Bebeoix
  • 579
  • 2
  • 5
  • 17
  • So, the fact that it's within HTML isn't especially relevant; you're just searching for a number that follows a dollar sign, right? – Wiseguy Apr 05 '12 at 05:23
  • Dare I dupe it.... yes, yes I do. [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Amber Apr 05 '12 at 05:24
  • 1
    Amber, trivial data extraction from HTML/XML is possible and feasible with regex. This question is not at all about trying to match the tag structure. To clarify again: What they are searching for is very much regular, thus your linked/duped question doesn't apply at all. – Joey Apr 05 '12 at 06:05
  • Have amended my answer to hopefully answer all of your question. – Michael Slade Apr 05 '12 at 06:43

3 Answers3

1

Try this

grep -e "\$[0-9]\{1,\}\.[0-9]\{2\}"

I put "$" to match better the pattern, you can remove it after the grep matches, piping another operation.

If you need to extract only the number maybe you can decide to not using grep but perl:

perl -ne '/\$([0-9]+\.([0-9]+))/ && print "$1\n"' < yourfile
dash1e
  • 7,677
  • 1
  • 30
  • 35
1

You want to use a hyphen - not .. to signify a range. You also need to escape the $ literal because it otherwise means end-of-line.

This should see it: grep "\$[0-9]+"

phatfingers
  • 9,770
  • 3
  • 30
  • 44
  • It works but the output is the entire line, I just want the number. – Bebeoix Apr 05 '12 at 05:41
  • Grep by itself won't give you what you want then... it returns whole lines based on a match. @Wiseguy -- I didn't forget the decimal... just didn't see the need to be that specific. – phatfingers Apr 05 '12 at 05:50
  • 1
    You might try something like `grep "\$[0-9]+" | sed "s/[^$]*\(\$[0-9.-]\).*/\1/"` – phatfingers Apr 05 '12 at 05:59
0

This would work to extract the number from the inner HTML of that '':

/[0-9.]+/

the other part of the problem is getting the HTML with the price in it. Here is a more complete example:

<html>
<head>
<script>
    function toggle(e,id) { 
        val = parseFloat(e.innerHTML.match(/[0-9.]+/));
        // Another method:
        // val = parseFloat(e.innerHTML.match(/\$([0-9.]+)/)[1]);
        alert(val);
    }
</script>
</head>
<body>

<table border=1><tr>
<td headers="fee" style="cursor:pointer;" onclick="toggle(this,'detailinfo088180');">
   $675.04 
</td>

blabla<br><em>$650</em>">blabla/a>
    </td>
  </tr>

</table>
</body>
</html>

Note the following:

  • The toggle() function takes an extra parameter, which is the element that was actually clicked. (Assuming you want the price to be extracted from the clicked element)
  • I have provided another regular expression that is more restrictive (must have a "$" at the front of the number) in case this is what you need. The expression makes use of capturing ("(..)") to match a string and extract a portion of the string instead of the entire string.

If you want to know more about how regular expressions work, try here. Or Google.

Michael Slade
  • 13,802
  • 2
  • 39
  • 44
  • I think you want /\$[0-9.]+/ because it seems the dollar sign in this question is significant. – jwir3 Apr 05 '12 at 05:28