-1

What can I do to delete all line in which one word (and not number) is the same. Here is the example:

4,5876746600174000,dog 
4,5876736392287000,nacho 
4,5876692287755000,dog 
4,5876684072439000,tree
4,5876692287773600,dog
4,5876692879655000,dog 
4,5876692434755000,frog

I would like to let it automatically delete every line that says "dog" after it was mentioned already once, including all content... any idea? It obviously is a csv file, so maybe its quicker with open office?

1 Answers1

0

To ommit repeated lines for 3rd field you can use:

sort -t, -k3 file.csv|sort -t, -k3 -u

which will give you:

4,5876692287755000,dog
4,5876692434755000,frog
4,5876736392287000,nacho
4,5876684072439000,tree

Do NOT try to make it shorter with just one sort, as sort -t, -k3 -u file.csv will give you different results:

4,5876746600174000,dog
4,5876692434755000,frog
4,5876736392287000,nacho
4,5876684072439000,tree

Note second field for dog's line has different value (it ommits all lines except the last one for dog)

I think it should be good as well to import that CSV file to some database (sqlite?). Than you can make SELECT with GROUP BY.

pawel7318
  • 3,383
  • 2
  • 28
  • 44