0

I have a class that models a person; among the properties apart from the usual is that person's mobile number

public class Person
{
    // Other properties elided here for brevity's sake
    public string PersonMobileNumber { get; set; }
}

That person is sent text messages that they may or may not respond to. I've defined a message thus:

public class SmsMessage
{
    public string NumberFrom { get; set;}
    public string NumberTo { get; set;}
    public string MessageContent { get; set; }
    public DateTime DateTimeReceived { get; set; }
}

Now what I want to see is who's responded to my text messages.

In plain English you have a number of SMS messages (List<SmsMessage>) and some people (List<Person>)

In the old pre-LINQ days I'd simply have foreached my way through both collections, with one foreach inside the first but I thought that LINQ should be able to help me here. The relevant statement that I've come up with looks as follows:

List<Person> peopleThatResponded =
    people.Where(p => smsMessages.Exists(s => s.NumberFrom == p.MobileNumber)).ToList();

It just feels that this will work fine for small data sets (this is based on debugging it and watching the cycles through the code) but I'm sure that there's a more performant way to do this. FWIW I have complete control over the class definitions and can make any changes that need making.

noonand
  • 2,763
  • 4
  • 26
  • 51

2 Answers2

2

A more performat way is to join them:

var peopleThatResponded = from pers in people
                          join sms in smsMessages
                          on pers.PersonMobileNumber equals sms.NumberFrom 
                          select pers;
List<Person> peopleListThatResponded = peopleThatResponded.ToList();

Why is LINQ JOIN so much faster than linking with WHERE?

But that would repeat people if they have responded multiple times. Another approach is to use a HashSet<string> instead of the list which is much more efficient.

var smsNumbers = new HashSet<string>(smsMessages.Select(sms => sms.NumberFrom));
List<Person> peopleThatResponded = people
    .Where(p => smsNumbers.Contains(p.PersonMobileNumber)).ToList();
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Is this still the case when the collection is in memory and not within a database? The example the OP gave seemed to be an in memory collection. The link you provided seems to provide reasons as to why joins are faster when querying a database using link (via ORM). – BenjaminPaul Sep 03 '14 at 09:23
  • Confirming that these are two in-memory collections, if that makes a difference – noonand Sep 03 '14 at 09:25
  • @BenjaminPaul: no, the link i provided shows that `Enumerable.Join` is much more performant with `Linq-To-Objects`(in memory). Databases often optimize `Where` to `Join`. Note that i've edited my answer to provide another approach. – Tim Schmelter Sep 03 '14 at 09:29
  • @noonand: it matters, look at my last comment. – Tim Schmelter Sep 03 '14 at 09:36
  • Thanks @TimSchmelter, Was genuinely interested! – BenjaminPaul Sep 03 '14 at 09:40
1

Using Exists in a where is an O(n*m) solution, so that will get exponetially slower with more people or messages.

You can put the existing numbers in a hash set, and then match the people against that. That will be an O(n+m) solution, so it scales well:

HashSet<string> numbers =
  new HashSet<string>(smsMessages.Select(m => m.NumberFrom));

List<Person> peopleThatResponded =
  people.Where(p => numbers.Contains(p.MobileNumber)).ToList();
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • 1
    Guffa, I think that @Tim Schmelter has beaten you to it timewise with the edit to his original answer. I appreciate the time you took to answer the question though. Thank you! – noonand Sep 03 '14 at 10:14