Grep: Capture just number

Question

I am trying to use grep to just capture a number in a string but I am having difficulty.

echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"

How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.

echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints

5
4

instead of 54

score 1 · Answer 1 · edited May 23 '17 at 11:57

1

Don't parse HTML with regex, use a proper parser :

$ echo "There are <strong>54</strong> cities " |
    xmllint --html --xpath '//strong/text()' -

OUTPUT:

Check RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 11:57

Community

1
1

answered Jan 07 '15 at 19:28

Gilles Quénot

173,512
41
224
223

Khanna111 · Accepted Answer · 2015-01-07T21:54:57.853

You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:

$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54

You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?

EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.

$  echo "There are <strong>54</strong> 12 cities" | sed  -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54

Here "r" is for extended regex.

EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.

$  echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54

RegEx Demo

Well there is a lot more text on the page and this code will return _all_ the numbers. Is there a way to just retrieve the # that is between the `` tags? — Bijan, Jan 07 '15 at 19:27

Grep: Capture just number

2 Answers2

Don't parse HTML with regex, use a proper parser :

OUTPUT:

Linked