This is my first post here but I'm hoping you can help me out with this as it's doing my head in!
I have a csv file containing a lot of data (~250,000 lines) and I need to remove the duplicate entries. There are only certain elements in each row that I would like to test for duplicates, but the other data needs to be shown in the end result. The columns Date, Lat and Lon need to be tested for duplicates. For example, if I start with this data:
Date Time Mag Lat Lon Depth Event
01/01/2008 01:38:25 1.04 35.5152 -120.8587 4.15 71091831
01/01/2008 01:44:27 0.84 38.8215 -122.8132 2.55 51193664
01/01/2008 01:46:59 0.48 38.8298 -122.811 2.44 51193666
01/01/2008 01:44:29 0.86 38.8215 -122.8132 2.76 51276634
01/01/2008 02:02:32 0.32 38.8193 -122.7968 5.86 51193667
It would remove the fourth line as it has the same Date, Lat and Lon as the second line and hence the output would be:
Date Time Mag Lat Lon Depth Event
01/01/2008 01:38:25 1.04 35.5152 -120.8587 4.15 71091831
01/01/2008 01:44:27 0.84 38.8215 -122.8132 2.55 51193664
01/01/2008 01:46:59 0.48 38.8298 -122.811 2.44 51193666
01/01/2008 02:02:32 0.32 38.8193 -122.7968 5.86 51193667
Thanks in advance!
Tom