explain part of sed expression - *\1$/p

Question

This code outputs lines where only the first and last digits are the same - could somebody explain in english how this works:

seq 1000 | sed -nr -e '/^([0-9])([0-9])*\1$/p'

outputs:

11
22
33 etc

I know it looks for a number at the start ^ and then another number but I am unclear how this works with the \1$ to get the answer?

Th second set of parens is unnecessary as it is a single character class. This does the same thing `sed '/^$[0-9]$[0-9]*\1$/!d'`. — potong, Dec 07 '11 at 15:33
May I recommend a regex analyzer? You can find one you like (there are severals) in the following thread: http://stackoverflow.com/q/2491930/149900 — pepoluan, Dec 09 '11 at 19:23

score 3 · Answer 1 · answered Dec 07 '11 at 14:05

Actually, what this matches is any digit:

([0-9])

followed by any number of digits

([0-9])*

followed by the first digit again

\1

\1 is a backreference to the first parenthesized group.

Note that the digits in the middle are unconstrained:

$ seq 8000 | sed -nr -e '/^([0-9])([0-9])*\1$/p' | tail
7907
7917
7927
7937
7947
7957
7967
7977
7987
7997

score 1 · Answer 2 · answered Dec 07 '11 at 14:02

1

It looks for a number at the start, followed by zero or more numbers (notice the star after the second parenthesis), and lastly checking for \1 at the end - which represents the exact same value as in the first parenthesis.

answered Dec 07 '11 at 14:02

Emil Vikström

90,431
16
141
175

score 1 · Answer 3 · answered Dec 07 '11 at 14:03

1

\1 is the "first matched term". $ is the "end of line".

So \1$ means "match the same term (ie. digit 0-9) found at the start of the string again at the end of the string.

answered Dec 07 '11 at 14:03

JJ.

5,425
3
26
31

score 1 · Answer 4 · answered Dec 07 '11 at 14:15

It starts with matching the start of line, then the parenthesis is a group (which can be referenced later) which is one digit 0-9. The group is followed by another group, also with one digit and this group can be repeated 0 ore more times. After that there is a reference to the first group (the \1) and finally a match for end of line.

So, basically it just says last digit must be same as first digit and there can be any number of digits between them.

There is no need grouping the middle digits since they are not referenced thus it could be rewritten as this

sed -nr -e '/^([0-9])[0-9]*\1$/p'

If you instead wanted that the last digit should be the same as the first digit and the second to last the same as the second so you would match 1221,245642 but not 2424 then you could use

sed -nr -e '/^([0-9])([0-9])[0-9]*\2\1$/p'

Try it with seq 100000

explain part of sed expression - *\1$/p

4 Answers4