Taking into account how much memory a string will take in C#, and assuming 10 characters length for 6 million records we get:
- size in bytes ~= 20 + (length / 2 ) * 4;
- total size in bytes ~= (20 + ( 10 / 2 ) * 4 )* 6000000 = 240 000 000
- total size in Mb ~= 230
Now, 230 MB of space is not really a problem, even on x86 (32 bit system), so you can load all that data in memory.
For this, I would use a HashSet class which is obviously, a hash set that will let you easily eliminate the duplicates, by using lookup before adding an element.
In terms of big-O notation for time complexity, the average performance of a lookup in a hash set is O(1), which is the best you can get. In total, you would use lookup N times, totalling to N * O(1) = O(N)
In terms of big-O notation for space complexity, you would have O(N) space used, meaning that you use up memory proportional to number of elements, which is also the best you can get.
I'm not sure it is even possible to use up less space if you implement the algorithm in C# and not rely on any external components (that would also use at least O(N))
That being said, you can optimize for some scenarios by reading your file sequentially, line by line, see here.
This would give a better result if you have lots of duplicates, but worst case scenario when all the lines are distinct would consume the same amount of memory.
On a final note, if you look how Distinct method is implemented, you will see it also uses an implementation of hash table, although it's not the same class, but the performance is still roughly the same, check out this question for more details.