Bash - Remove duplicates preserve order

Question

i have a file that looks like

1254543534523233434
3453453454323233434
2342342343223233535
0909909092324243535

Is there a way / command in bash to remove duplicates on the file above, based on a specific substring, without changing their order in the output?

ie

(With substring -> ${line:11:8}

1254543534523233434
2342342343223233535
0909909092324243535

I know that :

sort -u : sorts them numerically, then removes duplicates
sort -kx,x -u : The same
cat filein | uniq : requires them to be sorted already or it will not work

I m trying to figure out if there's a native linux solution without having to resolve to perl code for it. Thank you in advance.

This is not an exact duplicate. It has the additional constraint of comparing lines based only on a subtring, but printing the complete line. However, the [answer](http://stackoverflow.com/questions/1444406/how-can-i-delete-duplicate-lines-in-a-file-in-unix) should be easily extendible to `awk '!seen[substr($0, 11, 8)]++' file.txt`. — Martin Nyolt, Aug 22 '16 at 09:56

score 7 · Accepted Answer · answered Aug 22 '16 at 09:59

7

You can use awk without any need to sorting:

awk '!uniq[substr($0, 12, 8)]++' file

1254543534523233434
2342342343223233535
0909909092324243535

Since awk index starts from 1 you need to use substr($0, 12, 8) to get desired 8 characters long text starting from 12th position.
uniq is an associative array with substring retrieved using substr function.
++ sets value of array as 1

answered Aug 22 '16 at 09:59

anubhava

761,203
64
569
643

1

This worked perfectly, thank you. – onlyf Aug 22 '16 at 10:11

Bash - Remove duplicates preserve order

1 Answers1