4

I have a property called SelectedSections that is assigned from a collection of Sections. Each Section holds a collection of BidItems that contains 1,000+ items in it. When I select a Section, I need to refresh my collection of items that the view is databound to, with a filtered set of items.

public Section SelectedSection
{
    get 
    { 
        return selectedSection; 
    }

    set 
    {
        this.SetPropertyByReference(ref this.selectedSection, value);

        if (value != null)
        {
            this.BidItems = value.BidItems
                .Where(item => 
                    !item.Description.ToLower().Contains("flagger") ||
                    !item.Description.ToLower().Contains("civilian flagger") ||
                    !item.Description.ToLower().Contains("law enforcement"))
                .ToList();
        }

        this.MptPayment.EditedItem.DiaryPayItem.Section = value;
    }
}

I have to filter out about a dozen or so different types of items (I only show 3 for clarity sake). In my Where clause, I am converting everything to lower case before I check if the collection holds what I'm filtering out or not.

I realize this is going to generate a lot of garbage, as every one of the 1,000+ items in the collection will have a new string created for the lower-case Description content. My question is, would doing this a dozen times for every item in the collection be more expensive than my just checking all of the known variations? Disregarding the fact that I might miss a variation as I'm more interested in the theory of which is faster.

  1. Flagger
  2. flagger
  3. FLAGGER

The above list are all of the known variations. I'm wondering which would be the more expensive route to go. Iterating over the collection to check each of the known conditions would be fast enough, without the overhead of generating so much garbage. It's either, enumerate more than once per item/description in order to find them all, or enumerate once per item/description while creating garbage strings on the heap along the way that will be GC'd.

Note that this property could be re-set several dozen times as the user performs their work. So a lot (tens of thousands) of string comparisons will be performed.

I realize this is a cheap operation relative to the rest of the app; I'm wanting to know more for educating myself rather than worrying about a performance hit in my actual application.

Johnathon Sullinger
  • 7,097
  • 5
  • 37
  • 102
  • Unrelated to the question, but you realize "civilian flagger" and "flagger" are going to have overlap, right? There's no need for "civilian flagger" because "flagger" will catch it by default. – Johnny Bones Jul 28 '15 at 17:01
  • 4
    Why not use [case insensitive compare](https://stackoverflow.com/questions/444798/case-insensitive-containsstring)? – Saragis Jul 28 '15 at 17:03
  • Yeah, I don't actually check for "flager" in my source as I needed to be more specific in my criteria. I just added it for my example. Thanks for the tip tho – Johnathon Sullinger Jul 28 '15 at 17:03
  • Seems very easy to optimize using a `let lowerString = item.Description.ToLower()` in the Linq statement... – Ron Beyer Jul 28 '15 at 17:03
  • How is that different than the .ToLower() I'm already using? – Johnathon Sullinger Jul 28 '15 at 17:04
  • @JohnathonSullinger Ron is saying that you should call `ToLower` once and use the result in the multiple `Contains` calls. In method syntax `item => {var lower = item.Description.ToLower(); return !lower.Contains...;}` – juharr Jul 28 '15 at 17:05
  • @JohnathonSullinger, if you were responding to me, comparing with `CompareOptions.IgnoreCase` should be faster, since it does not involve creating new strings. – Saragis Jul 28 '15 at 17:08
  • Ultimately the best thing to do here is try the different approaches, time them, and watch the memory usage to see how they preform. You could then ask about the results to get a better understanding. – juharr Jul 28 '15 at 17:10
  • @Saragis Thanks! I wasn't aware of that one. My comment was to RonBeyer – Johnathon Sullinger Jul 28 '15 at 17:11
  • @juharr Ah, I can see that. That would solve the multiple enumeration per-word issue I was thinking I would have. – Johnathon Sullinger Jul 28 '15 at 17:13

1 Answers1

4

ToLower() on large collections would issue unnecessary GC pressure. Instead, use IndexOf >= 0 with StringConparison.OrdinalIgnoreCase:

this.BidItems = value.BidItems
            .Where(item => 
                !(item.Description.IndexOf("flagger", StringComparison.OrdinalIgnoreCase) >= 0) ||
                !(item.Description.IndexOf("civilian flagger", StringComparison.OrdinalIgnoreCase) >= 0) ||
                !(item.Description.IndexOf("law enforcement", StringComparison.OrdinalIgnoreCase) >= 0))
            .ToList();

A personal story - I was profiling our app and saw a method responsible for XML deserialization was emitting a huge amount of strings, causing our app to emit around 1.5GB per run. This method was on our hot path. This ended up being one of the programmers doing a ToLower on each iteration of parsing. By removing that call, I ended up saving more than 1GB of allocations per run.

As your collection gets larger, you'll see this causing more and more GC pressure. If you can avoid it, do so.

Johnathon Sullinger
  • 7,097
  • 5
  • 37
  • 102
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321