8

I'm writing a program that will simply read 2 different .csv files containing following information:

file 1                  file2
AA,2.34                BA,6.45
AB,1.46                BB,5.45
AC,9.69                BC,6.21
AD,3.6                 AC,7.56

Where first column is string, second is double.

So far I have no difficulty in reading those files and placing values to the List:

firstFile = new List<KeyValuePair<string, double>>();
secondFile = new List<KeyValuePair<string, double>>();

I'm trying to instruct my program:

  • to take first value from the first column from the first row of the first file (in this case AA)
  • and look if there might be a match in the entire first column in the second file.
  • If string match is found, compare their corresponding second values (double in this case), and if in this case match found, add the entire row to the separate List.

Something similar to the below pseudo-code:

for(var i=0;i<firstFile.Count;i++)
{
    firstFile.Column[0].value[i].SearchMatchesInAnotherFile(secondFile.Column[0].values.All);
    if(MatchFound)
    {
        CompareCorrespondingDoubles();
        if(true)
        {
            AddFirstValueToList();
        }
    }
}

Instead of List I tried to use Dictionary but this data structure is not sorted and no way to access the key by the index.

I'm not asking for the exact code to provide, rather the question is:

What would you suggest to use as an appropriate data structure for this program so that I can investigate myself further?

TiredOfProgramming
  • 845
  • 2
  • 16
  • 41
  • Dictionary should do exactly what you're after. `myDictionary[key]` returns the value assigned to a given key. To quickly convert your existing lists you could use this linq: https://stackoverflow.com/a/4022334/361842 – JohnLBevan Apr 24 '18 at 13:26
  • 1
    can a key repeat itself in the same file? or is it unique? – Mong Zhu Apr 24 '18 at 13:28
  • No. Duplicate keys are not allowed in dictionaries – Captain Obvious Apr 24 '18 at 13:29
  • @MongZhu No the key won't duplicate in the same file in my program. That's the logic. – TiredOfProgramming Apr 24 '18 at 13:29
  • 1
    @Butler1233: I think MongZhu's asking about TiredOfProgramming's requirements; i.e. to see if a dictionary is appropriate for this use case. – JohnLBevan Apr 24 '18 at 13:29
  • For a sorted dictionary you can look to System.Collections.Generic.SortedDictionary. – CodeNotFound Apr 24 '18 at 13:30
  • You can loop through values of a dictionary, use the foreach(var items in firstFile), and then you could use the Dictionary Key Indexes, which would be you string in this case to compare, this should help, but you may request for some code, if you need a guide – Adeoluwa Simeon Apr 24 '18 at 13:30
  • Do I understand correctly that you want to create a list containing all rows that exist in both files? If not, please provide sample input and the output you want to achieve. If yes, you may use linq's [`Intersect`](https://msdn.microsoft.com/en-us/library/bb355408(v=vs.110).aspx) with a custom `IEqualityComparer>` that can compare your entries. – René Vogt Apr 24 '18 at 13:30
  • @RenéVogt exactly, if string and double values match happens in the second file (basically row in first file matches row in second file), I would like to place it to the List – TiredOfProgramming Apr 24 '18 at 13:34
  • @MongZhu typo has been corrected :-) – TiredOfProgramming Apr 24 '18 at 13:42

1 Answers1

8

KeyValuePair is actually only used for Dictionarys. I suggest to create your own custom type:

public class MyRow
{
    public string StringValue {get;set;}
    public double DoubleValue {get;set;}

    public override bool Equals(object o)
    {
         MyRow r = o as MyRow;
         if (ReferenceEquals(r, null)) return false;
         return r.StringValue == this.StringValue && r.DoubleValue == this.DoubleValue;
    }
    public override int GetHashCode()
    {
        unchecked { return StringValue.GetHashCode ^ r.DoubleValue.GetHashCode(); }
    }
}

And store the files in lists of this type:

List<MyRow> firstFile = ...
List<MyRow> secondFile = ...

Then you can determine the intersection (all elements that occure in both lists) via LINQ's Intersect method:

var result = firstFile.Intersect(secondFile).ToList();

It's necessary to override Equals and GetHashCode, because otherwise Intersect would only make a reference comparison. Alternativly you could implement an own IEqualityComparer<MyRow, MyRow> that does the comparison and pass it to the appropriate Intersect overload, too.


But if you can ensure that the keys (the string values are unique), you can also use a

Dictionary<string, double> firstFile = ...    
Dictionary<string, double> secondFile = ...

And in this case use this LINQ statement:

var result = new Dictionary<string, double>(
          firstFile.Select(x => new { First = x, Second = secondFile.FirstOrDefault(y => x.Key == y.Key) })
                   .Where(x => x.Second?.Value == x.First.Value));

which had a time complexity of O(m+n) while the upper solution would be O(m*n) (for m and n being the row counts of the two files).

René Vogt
  • 43,056
  • 14
  • 77
  • 99
  • I was thinking about this solution, but didn't want to go that complex in my program. Let me try it in the code, and I'll reply later with the results. Thanks for helping. – TiredOfProgramming Apr 24 '18 at 13:45
  • 1
    @TiredOfProgramming You can use `Tuple` instead of `MyRow` which already has a built-in `Equals` and `GetHashCode`. – Spotted Apr 24 '18 at 13:48
  • 1
    @Spotted right, thought about that.. however, I like specific custom types, because often you want to add further properties or methods. But in this special case, a `Tuple` would be sufficient. – René Vogt Apr 24 '18 at 13:50
  • Are we talking here about firstFile = new List>(); – TiredOfProgramming Apr 24 '18 at 13:52
  • @TiredOfProgramming yes – René Vogt Apr 24 '18 at 13:52
  • @RenéVogt Fair enough, it's only a maintainability vs conciseness choice. – Spotted Apr 24 '18 at 13:52
  • @RenéVogt regarding your second solution with Dictionaries, it is impossible to implement as operator ? cannot be applied to to operand of type KeyValuePair – TiredOfProgramming Apr 25 '18 at 03:04
  • I actually implemented @Spotted suggestion on using Tuple and it worked for me fine. But I suspect Rene Vogt solutions would also work correctly, but lots of additional code for my tiny program. – TiredOfProgramming Apr 25 '18 at 03:47