7

Basically what I'm trying to do is take a list of objects and filter it based on a number of criteria where one of the criteria is that the key doesn't exist in another list. Here's an example:
My two classes are similar to this:

public class Test  
{  
  public string name;  
  public string instructor_name;  
  public string course;  
}  

public class Appointment  
{  
  public string site;
  public DateTime forWhen;
  public string testName;
}

I want to sort through a List<Test> by looking at the course and making sure that test doesn't exist in the List<Appointment>. In SQL I'd do it something like this:

SELECT new Group<Test>(c.Key, c)
FROM tests in testList
WHERE tests.Course != "Science"
AND tests.name NOT IN (SELECT testName FROM appotList)

However, I can't figure out how I would do this in LINQ. Any ideas?

legacybass
  • 582
  • 1
  • 7
  • 20
  • You might want to look at that first line of SQL...it's not SQL :) – Adam Robinson Jan 30 '11 at 21:29
  • I realize this, but I'm not trying to do it in SQL. That's just the format I would use if I were doing it in SQL. I wrote it that way so that people would have a better idea of what I was trying to do in LINQ. – legacybass Jan 31 '11 at 01:45

2 Answers2

12

If you're talking about performing client-side filtering, that's pretty easy with LINQ to Objects. Something like this:

List<Test> tests = ...;
List<Appointment> appts = ...;

var query = tests.Except(
            tests.Join(appts, t => t.name, a => a.testName, (t, a) => t));

The following is slightly simpler to read:

var query = tests.Where(t => !appts.Any(a => a.testName == t.name));

But the first version will be faster, since the Join function will compute a hash table of the matches rather than doing a linear search of the appts list for every element in tests.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
  • The second answer posted made more sense to me so I decided to use that, but I'm curious about your post. Can you make multiple selection criteria inside the Where method? I wanted to be able to compare the tests' dates to see if the test can be taken yet and also check to make sure there wasn't already an appointment. I'm still a novice when it comes to using the => operator so I'm not sure how I would go about listing multiple criteria. – legacybass Jan 31 '11 at 02:05
  • @legacybass: Yes, you can include as many conditions as you want to in the `Where` portion. I posted this version since it's simpler (and faster) if you're going against a list of in-memory objects rather than creating a database query. – Adam Robinson Feb 01 '11 at 21:25
  • So would those be separated by commas or semicolons? How would you use multiple criteria in the Join method? That's the one I'd be most interested in learning how to use. – legacybass Feb 02 '11 at 18:28
  • @legacybass: The `Join` function is, unfortunately, not as robust as where. It works by creating a simple hash table that links the two rows by a single value. You *can* join on multiple criteria, but it can only be done on simple equality comparisons across the columns. If you want a specific example, it's probably better to ask another question about how to do multiple conditions in a join clause and I'll be happy to answer :) – Adam Robinson Feb 02 '11 at 18:44
  • Watch the case sensitivity: var query = tests.Where(t => !appts.Any(a => a.testName.ToLower() == t.name.ToLower())); – Jon Apr 16 '14 at 14:42
5

http://introducinglinq.com/blogs/marcorusso/archive/2008/01/14/the-not-in-clause-in-linq-to-sql.aspx

Consider this code that returns all the customers who don't have an order in the Orders table. This is one SQL query that returns that value.

SELECT *
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] NOT IN (
    SELECT [t1].[CustomerID]
    FROM [dbo].[Orders] AS [t1]
)

This is not the faster way to get the desired result (using a NOT EXISTS is the favorite way - more on this shortly). LINQ offers a Contains extension method that allows writing the following code.

NorthwindDataContext dc = new NorthwindDataContext();
dc.Log = Console.Out;
var query =
    from c in dc.Customers
    where !(from o in dc.Orders
            select o.CustomerID)
           .Contains(c.CustomerID)
    select c;
foreach (var c in query) Console.WriteLine( c );

In LINQ to SQL the query is translated into this SQL code:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
       [t0].[ContactTitle], [t0].[Address], [t0].[City],
       [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE NOT (EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [dbo].[Orders] AS [t1]
    WHERE [t1].[CustomerID] = [t0].[CustomerID]
    ))

This approach is not only semantically equivalent, but also faster in execution. The following is the result with SET STATISTICS IO ON. The first result is for the hand-written query that use the NOT IN clause. The second result is for the LINQ to SQL generated query.

Jakub Konecki
  • 45,581
  • 7
  • 87
  • 126
  • Unless I'm misreading the question, it looks like he's asking how to do client-side filtering (i.e. LINQ-to-Objects), not how to do the filtering in SQL – Adam Robinson Jan 30 '11 at 21:35
  • @Adam - LINQ query still should work, shouldn't it? Instead of dc.Customers and dc.Orders he can use his collections of objects. – Jakub Konecki Jan 30 '11 at 21:42
  • This is actually the information I was looking for. I must have searched 3 dozen tutorials and documentation sites and never found anything that related it like this. Precisely what I needed. Thanks! – legacybass Jan 31 '11 at 02:02