I have a long vocabulary list, one word per line. Sometimes, there is a duplicate word, appearing more than once OR TWICE. I need a simple code that will leave the first occurrence of a word, but remove all duplicates (with its line) after it.
I don't want to remove any special characters or rearrange anything, only remove the words (one per line). Keeping the same word order is important.
It doesn't matter if it overwrites the original file or saves to a new one, whichever is "more efficient".
This is a list separated by line, not an array, not separated by space or comma.
I have not code to start with, hoping to solve with BASH...
sed
would be first choicegrep
would be second choiceThird choice would be something like a
for
loop
Eg: file.txt
apple
banana
car
bicycle
apple
tree
banana
apple
motorcycle
...should become:
apple
banana
car
bicycle
tree
motorcycle
I see some solutions for ARRAYS, but not simple lists, and answers via python, js, and C languages, but not BASH. If this has already been answered, show me where and I will gladly delete this question. The suggested dupl. article uses awk
, which is outside of the scope of this question, though it is related and useful.