how to extract substring and numbers only using grep/sed

Question

I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow:

miss rate 0.21  
ipc 222  
stalls n shdmem 112

So say I only want to extract the data for miss rate which is 0.21. How do I do it with grep or sed? Plus, I need more than one number, not only the one after miss rate. That is, I may want to get both 0.21 and 112. A sample output might look like this:

0.21 222 112

Cause I need the data for later plot.

sed is also acceptable if it works more elegantly in this case. — Hooloovoo, Mar 12 '13 at 23:24

score 7 · Answer 1 · answered Mar 12 '13 at 20:43

If you really want to use only grep for this, then you can try:

grep "miss rate" file | grep -oe '\([0-9.]*\)'

It will first find the line that matches, and then only output the digits.

Sed might be a bit more readable, though:

sed -n 's#miss rate ##p' file

that other guy · Accepted Answer · 2013-03-12T20:42:47.897

5

Use awk instead:

awk '/^miss rate/ { print $3 }' yourfile

To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o):

grep -Po '(?<=miss rate ).*' yourfile

edited Mar 12 '13 at 20:42

answered Mar 12 '13 at 20:35

that other guy

116,971
11
170
194

score 4 · Answer 3 · edited May 23 '17 at 12:02

4

Using the special look around regex trick \K with pcre engine with grep :

grep -oP 'miss rate \K.*' file.txt

or with perl :

perl -lne 'print $& if /miss rate \K.*/' file.txt

edited May 23 '17 at 12:02

Community

1
1

answered Mar 12 '13 at 21:03

Gilles Quénot

173,512
41
224
223

Added Perl portable solution =) – Gilles Quénot Mar 12 '13 at 21:48
the \K trick is really helpful. Yes I prefer grep to do this since I am not an expert in awk and also a prob with awk is the field separator since the text in a single field can have multiple and varying #spaces as in 'miss rate XX' and 'stalls total number XXX' – Hooloovoo Mar 12 '13 at 23:20

score 4 · Answer 4 · answered Mar 12 '13 at 22:05

The grep-and-cut solution would look like:

to get the 3rd field for every successful grep use:

grep "^miss rate " yourfile | cut -d ' ' -f 3

or to get the 3rd field and the rest use:

grep "^miss rate " yourfile | cut -d ' ' -f 3-

Or if you use bash and "miss rate" only occurs once in your file you can also just do:

a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}

where ${a[2]} is your result.

If "miss rate" occurs more then once you can loop over the grep output reading only what you need. (in bash)

score 0 · Answer 5 · answered Mar 12 '13 at 20:36

You can use:

grep -P "miss rate \d+(\.\d+)?" file.txt

or:

grep -E "miss rate [0-9]+(\.[0-9]+)?"

Both of those commands will print out miss rate 0.21. If you want to extract the number only, why not use Perl, Sed or Awk?

If you really want to avoid those, maybe this will work?

grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1

score 0 · Answer 6 · answered Mar 13 '13 at 00:01

0

I believe

sed 's|[^0-9]*$[0-9\.]*$|\1 |g' fiilename

will do the trick. However every entry will be on it's own line if that is ok. I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.

answered Mar 13 '13 at 00:01

Daniel Williams

8,673
4
36
47

I adapted this a bit to pull a 5 digit ticket number (always the first 5 numbers on the line) via $[0-9][0-9][0-9][0-9][0-9]$ as the capture group. – englebart Jul 26 '19 at 19:53

how to extract substring and numbers only using grep/sed

6 Answers6

Linked