6

I'm currently working on a web application in asp.net. In certain api-calls it is necessary to compare ListA with a ListB of Lists to determine if ListA has the same elements of any List in ListB. In other words: If ListA is included in ListB.

Both collections are queried with Linq of an EF-Code-First db. ListB has either one matching List or none, never more than one. In the worst case ListB has millions of elements, so the comparison needs to be scalable.

Instead of doing nested foreach loops, i'm looking for a pure linq query, which will let the db do the work. (before i consider multi column index)

To illustrate the structure:

//In reality Lists are queried of EF 
var ListA = new List<Element>();
var ListB = new List<List<Element>>(); 
List<Element> solution;
bool flag = false;
foreach (List e1 in ListB) {
   foreach(Element e2 in ListA) {
        if (e1.Any(e => e.id == e2.id)) flag = true;
        else {
             flag = false;
             break;
        }
    }
        if(flag) {
           solution = e1;
           break;
        }
}

Update Structure

Since its a EF Database i'll provide the relevant Object Structure. I'm not sure if i'm allowed to post real code, so this example is still generic.

//List B
class Result {
       ...
       public int Id;

       public virtual ICollection<Curve> curves; 

       ...
}

class Curve {
       ...
       public int Id;

       public virtual Result result;
       public int resultId;

       public virtual ICollection<Point> points;
       ...
}
public class Point{
    ...
    public int Id;
    ...
}

The controller (for the api-call) wants to serve the right Curve-Object. To identify the right Object, a filter (ListA) is provided (which is in fact a Curve Object) Now the filter (ListA) needs to be compared to the List of Curves in Result (ListB) The only way to compare the Curves is by comparing the Points both have. (So infact comparing Lists) Curves have around 1 - 50 Points. Result can have around 500.000.000 Curves

It's possible to compare by Object-Identity here, because all Objects (even the filter) is re-queried of the db.

I'm looking for a way to implement this mechanism, not how to get around this situation. (e.g. by using multi column index (altering the table))

(for illustration purposes):

class controller {
    ...
    public Response serveRequest(Curve filter) {
         foreach(Curve c in db.Result.curves) {
               if(compare(filter.points , c.points)) return c;

         }
    }
}
Community
  • 1
  • 1
Laurin
  • 177
  • 1
  • 13
  • Your code will not compile, please onlu post real code. obs : it is the `var` – Lucas Feb 23 '17 at 10:58
  • You need to use inner join but without knowing the structure better, its hard to suggest. – Dexion Feb 23 '17 at 11:04
  • Related, but not a dupe because of the EF concerns here: http://stackoverflow.com/questions/9524681/linq-compare-two-lists – H H Feb 23 '17 at 11:06
  • 1
    This part could be clarified: _"worst case ListB has millions of elements"_ - millions of elements or millions of lists? – H H Feb 23 '17 at 11:08
  • 1
    It is an interesting question but because of the word 'millions' I would consider pure or partial SQL solutions as well. – H H Feb 23 '17 at 11:10
  • @HenkHolterman To be precisely; ListB has millions of Lists. ListA and the lists in ListB have between 1 and 50 Elements. – Laurin Feb 23 '17 at 11:21
  • `e1 in ListB` and `e2 in ListA`? You are insane. – Buh Buh Feb 23 '17 at 11:39
  • Are elements (of a list) ordered? – Branko Dimitrijevic Feb 23 '17 at 11:40
  • @BuhBuh Im sorry if my code smells. Im pretty sure im not allowed to post real code, so i try to illustrate my problem with some, more or less, pseudo code. – Laurin Feb 23 '17 at 11:48
  • @BrankoDimitrijevic The Elements are note ordered. – Laurin Feb 23 '17 at 11:50
  • 1
    I think you should solve this with pure SQL, which, with the proper indexing, should be able to provide ample performance on this amount of data. It could be as simple as a single JOIN, in combination with COUNT to ensure all the elements have matched (or any of the other SQL incantations for set equality). – Branko Dimitrijevic Feb 23 '17 at 13:19

4 Answers4

1

Use Except:

    public static bool ContainsAllItems(IList<T> listA, IList<T> listB)
    {
        return !listB.Except(listA).Any();
    }

the above method will tell if listA contains all the elements of listB or not..and the complexity is much faster than O(n*m) approach.

Sombir Kumar
  • 1,841
  • 1
  • 17
  • 30
  • If he is pointing for the same instances in the memory this will work, or if he creates a `IEqualityComparer` – Lucas Feb 23 '17 at 11:50
0

Try this:

bool isIn = ListB.Any(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y)));

or, if you want the element

var solution = ListB.FirstOrDefault(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y)));
Magnetron
  • 7,495
  • 1
  • 25
  • 41
0

I have something for you:

var db = new MyContext();

var a = db.LoadList(); // or whatever
var b = new List<IQueryable<Entities>>(db.LoadListOfLists()/*or whatever*/);

b.Any(x => x.Count.Equals(a.Count) & x.All(y => a.Any(z => z.Id == y.Id)));
Lucas
  • 1,259
  • 15
  • 25
0

Because performance is concern, I would suggest convert your listA to lookup/dictionary before comparing Ex-

var listALookup = listA.ToLookup(item => item.Id);
var result = listB.FirstOrDefault(childList => childList.Count == listA.Count && childList.All(childListItem => listALookup.Contains(childListItem.Id)));

Lookup.Contain is O(1) while List.Contains is O(n)

Better option is to perform this comparison at db level, to reduce loading unnecessary data.

Kaushal
  • 1,251
  • 1
  • 8
  • 8