I have a text file with 14000 lines however many of these are duplicates. I want to count the unique lines, however i only have access to framework 3.0 and below. Is it possible to do this without using.linq?
-
Is the file "sorted" (ie are the duplicates one after another or are they anywhere in the file) ? – krtek Dec 17 '11 at 18:43
-
1Framework 3 does support LINQ, right? Did you check and can you be specific about Fx and C# versions? – H H Dec 17 '11 at 18:46
-
No they're all over the place – Evildommer5 Dec 17 '11 at 18:46
-
@HenkHolterman maybe it does but i dont want to use it. – Evildommer5 Dec 17 '11 at 18:48
-
@HenkHolterman: No, it doesn't. LINQ was introduced in .NET 3.5. – Jon Skeet Dec 17 '11 at 18:53
-
And does it have to work on 1.x as well? Makes a big difference. – H H Dec 17 '11 at 19:02
-
I'm using 3.0 and blow I cant use any higher. – Evildommer5 Dec 17 '11 at 19:07
-
1That's not the answer we need here. Do you use versions below 2.0 ? – H H Dec 17 '11 at 19:09
-
1You could use, ilSpy - on a newer project, and se how they do it in linq :) – Niklas Dec 17 '11 at 19:12
-
Linq doesn't enable new things to happen, it's just a grammar for expressing your code (gross simplification). So the answer to that portion of your questions is: Yes. It's possible. – Cj S. Dec 17 '11 at 19:33
2 Answers
Of course it's possible, you can loop through each line using StreamReader.ReadLine
and add each line to a HashTable structure using the line as the key and some dummy object as the value. Before adding the string though, you should check that the HashTable doesn't already have the key:
HashTable uniqueLines = new System.Collections.HashTable();
string line;
// Read each line of the file until the end
while ((line = reader.ReadLine()) != null)
{
// Check that we have not yet seen this string before
if(uniqueLines.ContainsKey(line) == false)
{
uniqueLines.Add(line, 0);
// You can write the lines to another file in necessary
writer.WriteLine(line);
}
}
At the end the number of items in the HashTable should be equal to the number of unique lines in the file:
int count = uniqueLines.Count;
// And don't forget to close the reader (and writer)!
Why does this work? Because the HashTable uses the hash code returned by GetHashCode(0 and according to MSDN:
If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.
Now I'm not sure how common it is when two different strings have the same hash code, but as I understand many LINQ methods use HashTable internally, so this may be the closest to what LINQ would do.

- 22,839
- 15
- 90
- 132
-
-
The type or namespace name 'HashTable' could not be found (are you missing a using directive or an assembly reference?) (CS0246) - C:\Users\Stefan\Dropbox\C\Assigment\Assigment\Program.cs:50,3 – Evildommer5 Dec 17 '11 at 19:01
-
-
Yes it is, but you need to import the `System.Collections` namespace. – Michiel van Oosterhout Dec 17 '11 at 19:07
I think you also could write it by linq .
var result = from p in File.ReadAllLines(filepath)
group p by p into g
select new { Key = g.Key, Count = g.Count() };
It is intelligible.

- 4,236
- 1
- 24
- 33