0

I decided to pour database records into List<> model and use enumerable Linq to get record from it. It have 141,856 records in it. What we found instead is it is pretty slow.

So, any suggestion or recommendation on making it run very quickly?

public class Geography
{
    public string Zipcode { get; set; }
    public string City { get; set; }
    public string State { get; set; }
}

var geography = new List<Geography>();

geography.Add(new Geography() { Zipcode = "32245", City = "Jacksonville", State = "Florida" });
geography.Add(new Geography() { Zipcode = "00001", City = "Atlanta", State = "Georgia" });

var result = geography.Where(x => (string.Equals(x.Zipcode, "32245", String Comparison.InvariantCulterIgnoreCase))).FirstOrDefault();

When we have 86,000 vehicles in Inventory and we want to use parallel task to get it done quickly but it become very slow when geography is being looked up.

await Task.WhenAll(vehicleInventoryRecords.Select(async inventory =>
{
    var result = geography.Where(x => (string.Equals(x.Zipcode, inventory.Zipcode, String Comparison.InvariantCulterIgnoreCase))).FirstOrDefault();

}));
fletchsod
  • 3,560
  • 7
  • 39
  • 65
  • Holding the 141,856 in memory is a pretty bad idea. You really should be applying this filter to the `IQueryable` if you are working with EnityFramework or in your query as a parameter if you are not using a ORM. – Scott Chamberlain Mar 12 '16 at 19:45
  • Please show your code you did for the database query, that should not not happen so there is some other kind of issue is happening. – Scott Chamberlain Mar 12 '16 at 19:53
  • Its really good at managing resource if it is configured correctly. Either it is incorrectly configured or you have poor queries. – Scott Chamberlain Mar 12 '16 at 20:01
  • Are your ZIP codes unique? I added an answer in case they aren't, although this information should be in the question – Gediminas Masaitis Mar 12 '16 at 20:57

2 Answers2

2

Use dictionary<string, Geography> to store geography data. Looking up data in dictionary by key is O(1) operation while for list it is O(n)

Giorgi
  • 30,270
  • 13
  • 89
  • 125
  • I might be nitpicking, but technically, `Dictionary` lookup performance is O(n), since it resolves hash collisions by putting multiple values into lists, which it has to iterate over. It can only be guaranteed to be O(1) for some specific types like where the value can be directly translated to an `int`. If the key type has a `GetHashCode` function which always returns the same value, the `Dictionary` lookup performance will be the same as a `List`. – Gediminas Masaitis Mar 12 '16 at 19:37
  • @GediminasMasaitis no, it is not O(n). it is amortized O(1). Assuming a resonable GetHashCode() implmentation of the key it will be O(1), it only fails to `O(n)` when you have a extreemly bad hash code implementation (which String does not) – Scott Chamberlain Mar 12 '16 at 19:39
  • in this way you can not have multiple zipcodes... zipcodes are not unique(just did quick search). – M.kazem Akhgary Mar 12 '16 at 19:40
  • @ScottChamberlain You're right, I didn't know that term! Here's a [good explanation](http://stackoverflow.com/questions/200384/constant-amortized-time) for others who may be thinking the way I've been thinking before. – Gediminas Masaitis Mar 12 '16 at 19:50
  • That's a fine idea. Unfortunately I get duplicate zipcode error on dictionary key. :-/ – fletchsod Mar 12 '16 at 20:05
  • In that case you can use List of Geography objects as value of the dictionary item. – Giorgi Mar 12 '16 at 21:01
0

You haven't mentioned if your ZIP codes are unique, so I'll assume they aren't. If they are - look at Giorgi's answer and skip to part 2 of my answer.

1. Use lookups

Since you're looking up your geography list multiple times by the same property, you should group the values by Zipcode. You can do this easily by using ToLookup - this will create a Lookup object. It is similar to a Dictionary, except it can multiple values as it's value. Passing a StringComparer.InvariantCultureIgnoreCase as the second parameter to your ToLookup will make it case-insensitive.

var geography = new List<Geography>();
geography.Add(new Geography { Zipcode = "32245", City = "Jacksonville", State = "Florida" });
geography.Add(new Geography { Zipcode = "00001", City = "Atlanta", State = "Georgia" });

var geographyLookup = geography.ToLookup(x => x.Zipcode, StringComparer.InvariantCultureIgnoreCase);
var result = geographyLookup["32245"].FirstOrDefault();

This should increase your performance considerably.

2. Parallelize with PLINQ

The way you parallelize your lookups is questionable. Luckily, .NET has PLINQ. You can use AsParallel and a parallel Select to asynchronously iterate over your vehicleInventoryRecords like this:

var results = vehicleInventoryRecords.AsParallel().Select(x => geographyLookup[x].FirstOrDefault());

Using Parallel.ForEach is another good option.

Community
  • 1
  • 1
Gediminas Masaitis
  • 3,172
  • 14
  • 35