I know I can use sort --unique
to remove the duplicate rows in a text file (or on the standard input). But - what if I want to maintain the original order of rows?
I know that if the duplicates happen to be consecutive, uniq
does the trick; but in my case, the duplicates might be farther apart from each other.
Also, I realize I can write a small program to do this in C, or perhaps in Python - but I would like to do that with bash. A naive solution would be using a bash dictionary as a set and adding lines into there... but I doubt this would scale very well.
Just to illustrate:
original file | after duplicate removal |
---|---|
one two five two two four |
one two five four |