I have a text file with contents that may be duplicates. Below is a simplified representation of my txt file. text
means a unique character or word or phrase). Note that the separator ----------
may not be present. Also, the whole content of the file consists of unicode Japanese and Chinese characters.
EDITED
sometext1
sometext2
sometext3
aaaa
sometext4
aaaa
aaaa
bbbb
bbbb
cccc
dddd
eeee
ffff
gggg
----------
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10
What I want to achieve is to keep only the line with the last occurrence of the duplicates like so:
sometext1
sometext2
sometext3
sometext4
aaaa
bbbb
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10
The closest I found online is How to remove only the first occurrence of a line in a file using sed but this requires that you know which matching pattern(s) to delete. The suggested topics provided when writing the title gives Duplicating characters using sed and last occurence of date but they didn't work.
I am on a Mac with Sierra. I am writing my executable commands in a script.sh file to execute commands line by line. I'm using sed
and gsed
as my primary stream editors.