0

I have

IEnumerable<Tuple<string, string>> allInfo

and IEnumerable<string> info1dim. What is a way to find effectively the diff between info1dim and first dim of allInfo. For example :

allInfo = {<"data1", "addinfo1">, <"data2", "addinfo2">, <"data3", "addinfo3">"

and

info1dim = {"data3", "data1", "data4"}

The result I expect is

{"diff4"}

What is the most efficient way to do that? I don't want to run two loops. The IEnumerables are huge (~100000 elements)

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
YAKOVM
  • 9,805
  • 31
  • 116
  • 217
  • Assuming .NET has something similar to a Python `set`: Iterate the first enumerable and store all the keys in a set. Then iterate the second one and store them in another set. Now you can simply get the difference between the two sets (`a - b` and `b - a`) to find out which keys are only in the first/second enumerable. – ThiefMaster Jun 07 '15 at 23:25
  • 2
    "I don't want to run two loops" is an artificial requirement. Your problem *needs* iterating over *both* enumerables. – Ondrej Tucny Jun 07 '15 at 23:30
  • @OndrejTucny - you are right. let`s say - "i am looking for effective builtin approach" – YAKOVM Jun 07 '15 at 23:38
  • I would seriously consider storing the data in a hashbased collection such as Dictionary (as suggested in my answer). With that you have almost instant look up, and you only need to loop over the IEnumerable once, you never actually loopthrough the dictionary. – BjarkeCK Jun 07 '15 at 23:40
  • If the two lists are sorted, you can use two index counter i & j (one for each list). Then compare list1[i] and list2[j]. You have three cases 1) list1[i] == list2[j]. item is in both lists. Increment I & j 1) list1[i] > list2[j]. item is in 2 and not 1. Increment j not i. 2) list1[i] < list2[j] item is in 1 and not 2. increment i not j. – jdweng Jun 07 '15 at 23:41
  • How big a collection is info1dim compared to allInfo? – BjarkeCK Jun 07 '15 at 23:48

4 Answers4

4

The C# HashSet collection has ExceptWith, UnionWith, and IntersectWith methods. What you want could be done like this.

        var set1 = new HashSet<string>(allinfo.Select(t => t.Item1));
        var set2 = new HashSet<string>(info1dim);

        var set1_but_not_set2 = new HashSet<string>(set1);
        set1_but_not_set2.ExceptWith(set2);

        var set2_but_not_set1 = new HashSet<string>(set2);
        set2_but_not_set1.ExceptWith(set1);

Be careful, though, HashSet is a mutable collection and these functions change the collection. You have O(n) operations here. Constructing the HashSet objects requires iterating; so do the ExceptWith operations.

Sehnsucht
  • 5,019
  • 17
  • 27
O. Jones
  • 103,626
  • 17
  • 118
  • 172
2

You could use a LINQ Except() like so:

info1dim.Except(allInfo.Select(i => i.Item1));

Note that Except() uses a HashSet<T> internally (as explained here) so this is still O(n).

Community
  • 1
  • 1
Matt Cole
  • 2,491
  • 17
  • 21
1

Maybe something like this?

var diff = info1dim.Where(x => allInfo.Any(c => c.Item1 == x) == false);

If you store the IEnumerable<Tuple<string, string>> in a Dictionary<string,string> instead it would become ALOT faster! then you could write:

Dictionary<string,string> allInfo;
IEnumerable<string> info1dim;
var diff = info1dim.Where(x => allInfo.ContainsKey(x) == false);
BjarkeCK
  • 5,694
  • 5
  • 41
  • 59
0

load your info1dim in a HashSet and use Remove foreach item in allInfo :

// n: size of info1dim ; m: size of allInfo
var diff = new HashSet<string> (info1dim); // O(n)
foreach (var tuple in allInfo)  // O(m)
    diff.Remove (tuple.Item1);  // O(1)

I didn't recall of ExceptWith existence before Ollie's answer ; after verifying at the source reference ExceptWith basically do the same (foreach -> Remove) and so should be better ; I keep my code as is as informative support tough

Sehnsucht
  • 5,019
  • 17
  • 27