sed extracting group of digits

Question

I have tried to extract a number as given below but nothing is printed on screen:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9]*\) apples/\1/p'

However, I get '65', if both digits are matched separately as given below:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9][0-9]\) apples/\1/p'
65

How can I match a number such that I don't know the number of digits in a number to be extracted e.g. it can be 2344 in place of 65?

score 29 · Accepted Answer · answered Feb 13 '12 at 12:42

29

$ echo "This is an example: 65 apples" | sed -r  's/^[^0-9]*([0-9]+).*/\1/'
65

answered Feb 13 '12 at 12:42

codaddict

445,704
82
492
529

5

+1, but beware that not all sed support -r and thus cannot use the '+' modifier and must escape the parens. – William Pursell Feb 13 '12 at 12:51
3

Why does a regex like `[([0-9]*) apple]`(http://sprunge.us/feGV) doesn't work in sed? It works just fine in python. – shadyabhi Feb 13 '12 at 12:54
so... ^[^0-9]* correspond to everything non-digit at the start of line. [0-9]+ to atleast one digit or more, right? – Uthman Feb 13 '12 at 12:55
1

@AbhijeetRastogi: Since we are using **substitution** we need to account for the entire line. Any part of the line not accounted for will be part of the output. This won't be the case if you are using pattern search (not substitution) as in your Python case. – codaddict Feb 13 '12 at 13:04
1

@codaddict Oops. My bad. Silly me. It's substitution. Thanks. – shadyabhi Feb 13 '12 at 13:25

mathematical.coffee · Answer 2 · 2012-02-13T12:48:51.550

6

It's because your first .* is greedy, and your [0-9]* allows 0 or more digits. Hence the .* gobbles up as much as it can (including the digits) and the [0-9]* matches nothing.

You can do:

echo "This is an example: 65 apples" | sed -n  's/.*\b\([0-9]\+\) apples/\1/p'

where I forced the [0-9] to match at least one digit, and also added a word boundary before the digits so the whole number is matched.

However, it's easier to use grep, where you match just the number:

echo "This is an example: 65 apples" | grep -P -o '[0-9]+(?= +apples)'

The -P means "perl regex" (so I don't have to worry about escaping the '+').

The -o means "only print the matches".

The (?= +apples) means match the digits followed by the word apples.

edited Feb 13 '12 at 12:48

answered Feb 13 '12 at 12:43

mathematical.coffee

55,977
11
154
194

I think sed doesn't identify the non-greedy `?` identifier. [See this](http://stackoverflow.com/a/1103177/167814). – shadyabhi Feb 13 '12 at 12:45
The first example has now been fixed! (and was fixed before my previous comment) – ctrl-alt-delor Feb 13 '12 at 13:13
I like the idea, but grep -P is not supported on macOS for those reading this. – volvox Jan 13 '18 at 21:33

score 3 · Answer 3 · answered Apr 15 '16 at 07:22

3

A simple way for extracting all numbers from a string

echo "1213 test 456 test 789" | grep -P -o "\d+"

And the result:

1213
456
789

answered Apr 15 '16 at 07:22

Khate

339
1
3
11

score 3 · Answer 4 · answered Feb 13 '12 at 12:42

What you are seeing is the greedy behavior of regex. In your first example, .* gobbles up all the digits. Something like this does it:

echo "This is an example: 65144 apples" | sed -n  's/[^0-9]*\([0-9]\+\) apples/\1/p'
65144

This way, you can't match any digits in the first bit. Some regex dialects have a way to ask for non-greedy matching, but I don't believe sed has one.

score 0 · Answer 5 · answered May 20 '23 at 17:25

Now the rust tool ripgrep is a nice alternative. It is fast, runs on windows, linux and mac, and implements most of posix regex.

echo "This is an example: 65 apples" | rg '\d+' -o
65

The documentation for the -o option states:

-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

score 0 · Answer 6 · answered Feb 13 '12 at 13:07

0

echo "This is an example: 65 apples" | ssed -nR -e 's/.*?\b([0-9]*) apples/\1/p'

You will however need super-sed for this to work. The -R allows perl regexp.

answered Feb 13 '12 at 13:07

ctrl-alt-delor

7,506
5
40
52

sed extracting group of digits

6 Answers6

Linked