4

What I have is two files, sourcecolumns.txt and destcolumns.txt. What I need to do is compare source to dest and if the dest doesn't contain the source value, write it out to a new file. The code below works except I have case sensitive issues like this:

source: CPI
dest: Cpi

These don't match because of captial letters, so I get incorrect outputs. Any help is always welcome!

string[] sourcelinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt");
string[] destlinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt");

foreach (string sline in sourcelinestotal)
{
    if (destlinestotal.Contains(sline))
    {
    }
    else
    {
        File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline);
    }
}
tvanfosson
  • 524,688
  • 99
  • 697
  • 795

3 Answers3

5

You could do this using an extension method for IEnumerable<string> like:

public static class EnumerableExtensions
{
    public static bool Contains( this IEnumerable<string> source, string value, StringComparison comparison )
    {
         if (source == null)
         {
             return false; // nothing is a member of the empty set
         }
         return source.Any( s => string.Equals( s, value, comparison ) );
    }
}

then change

if (destlinestotal.Contains( sline ))

to

if (destlinestotal.Contains( sline, StringComparison.OrdinalIgnoreCase ))

However, if the sets are large and/or you are going to do this very often, the way you're going about it is very inefficient. Essentially, you're doing an O(n2) operation -- for each line in the source you compare it with, potentially, all lines in the destination. It would be better to create a HashSet from the destination columns with a case insenstivie comparer and then iterate through your source columns checking if each one exists in the HashSet of the destination columns. This would be an O(n) algorithm. note that Contains on the HashSet will use the comparer you provide in the constructor.

string[] sourcelinestotal = 
    File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt"); 
HashSet<string> destlinestotal = 
                new HashSet<string>(
                  File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt"),
                  StringComparer.OrdinalIgnoreCase
                );

foreach (string sline in sourcelinestotal) 
{ 
    if (!destlinestotal.Contains(sline)) 
    { 
        File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline); 
    } 
}

In retrospect, I actually prefer this solution over simply writing your own case insensitive contains for IEnumerable<string> unless you need the method for something else. There's actually less code (of your own) to maintain by using the HashSet implementation.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • @aba - in the general case, the collection might contain the empty string, though perhaps in this case not. – tvanfosson Apr 28 '10 at 18:28
  • I cant get this to compile Using the generic type 'System.Collections.Generic.HashSet' requires '1' type arguments C –  Apr 28 '10 at 19:46
  • I left out the type specifier on the HashSet constructor. I've fixed this. – tvanfosson Apr 28 '10 at 20:49
  • @tvanfosson - is there a reason for not just doing two IEnumerables containing the lines and using List1.Except(List2, StringComparison.OrdinalIgnoreCase). I would have thought this would implement in the most efficient way? – Martin Apr 03 '12 at 21:14
  • @Martin If the second collection is a list, the `Except`-based solution is likely O(N log N) (or O(N^2)) whereas the `HashSet`-based solution is O(N). The reason for the difference is that you have either sort both lists so you can do an O(N) comparison (or do an O(N) traversal of the second list for each item in the first). With the HashSet you do an O(1) lookup for each of the items in the first set. – tvanfosson Apr 03 '12 at 21:23
4

Use an extension method for your Contains. A brilliant example was found here on stack overflow Code isn't mine, but I'll post it below.

public static bool Contains(this string source, string toCheck, StringComparison comp) 
{
    return source.IndexOf(toCheck, comp) >= 0;
}

string title = "STRING";
bool contains = title.Contains("string", StringComparison.OrdinalIgnoreCase);
Community
  • 1
  • 1
StyxRiver
  • 2,225
  • 1
  • 16
  • 20
  • This doesn't solve the problem -- he wants to see if the **collection** of strings contains a particular string in a case-insensitive manner. This only checks if a **string** contains another string in a case-insensitive manner. You'd need to have an extension method on `IEnumerable`, not `string`. – tvanfosson Apr 28 '10 at 18:19
  • I saw your answer, much better than mine. I honestly hadn't considered the performance aspect, and I was unaware of the HashSet's overloaded constructor. This is always a nice extension to have, in either case! – StyxRiver Apr 28 '10 at 18:36
0

If you do not need case sensitivity, convert your lines to upper case using string.ToUpper before comparison.

Danvil
  • 22,240
  • 19
  • 65
  • 88