0

I am trying to use grep to just capture a number in a string but I am having difficulty.

echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"

How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.

echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints

5
4

instead of 54

Bijan
  • 7,737
  • 18
  • 89
  • 149

2 Answers2

1

Don't parse HTML with regex, use a proper parser :

$ echo "There are <strong>54</strong> cities " |
    xmllint --html --xpath '//strong/text()' -

OUTPUT:

54

Check RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
0

You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:

$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54

You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?

EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.

$  echo "There are <strong>54</strong> 12 cities" | sed  -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54

Here "r" is for extended regex.

EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.

$  echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54

RegEx Demo

Khanna111
  • 3,627
  • 1
  • 23
  • 25
  • Well there is a lot more text on the page and this code will return _all_ the numbers. Is there a way to just retrieve the # that is between the `` tags? – Bijan Jan 07 '15 at 19:27