-1

I have a large text file with over 100k of lines. Some of the lines are duplicates. I would like to be a to dedupe these entries before processing them. I am using visual basic 2010 Express to write this.

Text file example:

132165
165461
646843
654654
321358
132165
165461
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
Eric Fluharty
  • 111
  • 2
  • 8
  • 3
    Thank you for posting your requirements. Now post your attempt... – Mitch Wheat Sep 12 '13 at 15:17
  • 1
    It seems to me that you don't want to de-dupe the file. You want us to. You need to show some more effort. At the very least you need to show us that you can program. SO isn't a site where you write code for people that have no clue at all. – David Heffernan Sep 12 '13 at 15:20
  • To help get you started, I would just read the file, and split it into an array around the line break. Then create another array by iterating over the original one. For each element of the original array, check to see if it is in your new one, if it isnt, go ahead and add it. After this is done, just overwrite your original text file with your results. This isn't terribly efficient but it will do the job. – Derek Meyer Sep 12 '13 at 15:25
  • 1
    The OP is looking for an method, not code. There's nothing wrong with asking for an algorithm in SO, and it is very difficult to write code (as in an attempt) before you have the algorithm. – xpda Sep 12 '13 at 15:53

1 Answers1

4

I want to dedupe these entries before processing them

You can use a HashSet(Of T)

Dim nodupes As New HashSet(Of String)(File.ReadLines(path))
For Each str As String In nodupes
    ' no duplicate here '
Next

Edit Since a HashSet(Of T) does not guarantee to preserve the insertion order you can use following code if you need to ensure this order:

Dim nodupeSet As New HashSet(Of String)
Dim nodupes = From line In File.ReadLines(path)
              Where nodupeSet.Add(line)
For Each str As String In nodupes
    ' no duplicate here '
Next
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939