-3

I have a text file with 14000 lines however many of these are duplicates. I want to count the unique lines, however i only have access to framework 3.0 and below. Is it possible to do this without using.linq?

Evildommer5
  • 139
  • 3
  • 15

2 Answers2

3

Of course it's possible, you can loop through each line using StreamReader.ReadLine and add each line to a HashTable structure using the line as the key and some dummy object as the value. Before adding the string though, you should check that the HashTable doesn't already have the key:

HashTable uniqueLines = new System.Collections.HashTable();
string line;

// Read each line of the file until the end
while ((line = reader.ReadLine()) != null)
{
  // Check that we have not yet seen this string before
  if(uniqueLines.ContainsKey(line) == false) 
  {
    uniqueLines.Add(line, 0);

    // You can write the lines to another file in necessary
    writer.WriteLine(line);
  }
}

At the end the number of items in the HashTable should be equal to the number of unique lines in the file:

int count = uniqueLines.Count;
// And don't forget to close the reader (and writer)!

Why does this work? Because the HashTable uses the hash code returned by GetHashCode(0 and according to MSDN:

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

Now I'm not sure how common it is when two different strings have the same hash code, but as I understand many LINQ methods use HashTable internally, so this may be the closest to what LINQ would do.

Michiel van Oosterhout
  • 22,839
  • 15
  • 90
  • 132
0

I think you also could write it by linq .

     var result = from p in File.ReadAllLines(filepath)
         group p by p into g
         select new { Key = g.Key, Count = g.Count() };

It is intelligible.

shenhengbin
  • 4,236
  • 1
  • 24
  • 33