0

Is it possible to grep somehow all prices from a file and list the output? Price begins with "$" and may contain digits, "," and ".".

I've tried best solutions from this question, but they output all file or entire string containing a price.

The pattern I use is simple: \$

The page on the web I want to grep: http://www.ned.org/

Example of the page source:

<p><strong>Better Understanding Public Attitudes and Opinions</strong>
</p>
<p>Democratic Ideas and Values</p>
<p>$43,270</p>
<p>To monitor and better understand public views on key social, political, and economic developments. Citizens’ opinions will be tracked, documented, and studied ahead of and after the country’s September 2016 parliamentary elections. The results and accompanying analysis will be disseminated through print and electronic publications, a website, and independent media.</p>
<p><strong> </strong></p>

I want to output from this piece of html something like 43,270 or may be 43270. Just to lazy to write a parser :)

Community
  • 1
  • 1
kelin
  • 11,323
  • 6
  • 67
  • 104

1 Answers1

2

Something like this seems to work fine for my tests:

$ echo "$prices"
tomato $30.10
potato $19.1
apples=$2,222.1
oranges:$1
peach="$22.1",discount 10%,final price=$20

$ egrep -o '\$[0-9]+([.,][0-9]+)*' <<<"$prices"
$30.10
$19.1
$2,222.1
$1
$22.1
$20

Real test with your web page:

$ links -dump "http://www.ned.org/region/central-and-eastern-europe/belarus-2016/" |egrep -o '\$[0-9]+([.,][0-9]+)*'
$43,270
$25,845
$55,582
$14,940
$44,100
$35,610
$54,470
$60,200
$33,150
$15,720
$35,160
$45,500
$72,220
$26,330
$53,020
$27,710
$22,570
$40,145
#more prices following bellow
George Vasiliou
  • 6,130
  • 2
  • 20
  • 27