4

I need to search through a collection of "Item" objects that contain a time property. I have a fast solution in place but its very messy (will post if nothing better comes up). Below are my attempts at performing the search myself using LINQ and whatnot.

In my particular case, I know the items are ordered low to high based on time. When I iterate through them, it's 9/12, 9/13, 9/14. I'd like to find a solution that's fast even if this isn't ordered, but not important right now.

//ICollection c = GetCollection(); //25,000+ items
DateTime TIME = DateTime.Now.AddDays(-1);

EventLog scan = new EventLog("Application", "Server", "N/A");
EventLogCollection c = scan.Entries;
Console.WriteLine(logs.Count); // All entries already in list here

// 64 sec - SLOW
List<Item> l1 = new List<Item>();
foreach (Item i in c) { 
  if (i.time > TIME) { 
    l1.Add(i); }
}

// 93 sec - SLOWER
var l2 = c.Cast<Item>().AsParallel().Select(n => n.time > TIME);
var i = l2.Count();

// 98 sec - EVEN SLOWER!
List<Item> l3 = new List<Item>();
Parallel.ForEach(c.Cast<Item>(), n => {
    if (n.time > TIME) {
        l3.add(n);
    }
});

My current solution is do a BinarySearch for the start and end times and loop through the ICollection based on those indexes. It's very fast (1-2 sec) but very messy. I don't need a different solution but thought I'd toss this out to you performance experts.

Is there a faster, elegant to way to search the ICollection? BTW I have no control over the collection I'm given and can't change it's structure. .NET 4.0

And no, I can't use System.Diagnostics.Eventing.Reader because I'm stuck with Windows XP

user1002479
  • 235
  • 2
  • 4
  • 12
  • possible duplicate of [Can LINQ use binary search when the collection is ordered?](http://stackoverflow.com/questions/1766328/can-linq-use-binary-search-when-the-collection-is-ordered) – Mike Perrenoud Oct 01 '12 at 13:05
  • 1
    Honestly, I'm surprised it's taking 64 seconds for looping through 25,000 items only performing a cast, value check, and add. Is your `GetCollection` actually lazily pulling out records from a database or something? Or doing extra work when iterating or calculating its `time`? – Chris Sinclair Oct 01 '12 at 13:06
  • Out of curiosity, how do you perform random access to `ICollection` in order to run a binary search on it? – Sergey Kalinichenko Oct 01 '12 at 13:12
  • I'm not actually searching an ICollection... it's really an EventLogEntryCollection that contains the this[int index] { get;}. I'm curious if there's a better approach. Sorry for the confusion Chris- no. I'm getting a populated list... I think it's the Cast that's taking forever but haven't verified that yet. Still working on this – user1002479 Oct 01 '12 at 13:17
  • @user1002479: then look into how the indexer is implemented! Is it O(1)? Is it doing lazy loading? – TDaver Oct 01 '12 at 13:19
  • TDaver- I don't want to use the indexer. The list has all of the values in it already – user1002479 Oct 01 '12 at 13:21
  • 1
    without an indexer, how do you do a BinarySearch? Random Access (i.e. access by index) is needed for BinarySearch – TDaver Oct 01 '12 at 13:49
  • My current solution uses the indexer. Because of this, I now have a IComparer class and a sketchy BinarySearch method. I'm trying to find a solution that doesn't require the extra stuff. Pretend it's not there :) – user1002479 Oct 01 '12 at 13:54
  • `EventLogEntryCollection` supports an indexer, despite not implementing `IList` http://msdn.microsoft.com/en-us/library/system.diagnostics.eventlogentrycollection.item.aspx – Jodrell Oct 01 '12 at 14:20
  • I extended my answer with a binary search option which should be faster if you are looking more than 50 or so events. – Jodrell Oct 01 '12 at 15:57

3 Answers3

2

Your "query" is incorrect. You're actually selecting a bunch of booleans, but not filtering. You want:

var l2 = c.Cast<EventLogEntry>().AsParallel().Where(n => n.time > TIME);

This may not be faster, but it's worth a check. Heck, I wouldn't even do it in parallel to start with.

In fact, Count has an overload which takes a predicate, so you can use this:

var count = c.Cast<EventLogEntry>().Count(n => n.time > TIME);

Or a parallelized version:

var count = c.Cast<EventLogEntry>().AsParallel().Count(n => n.time > TIME);

Like Chris, I'm shocked that this would take over a minute for any implementation...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
1

EDIT

On the assertion that dasblinkenlight's answer is generally right I wrote a quick generic helper function.

public static int BinaryFirstIndexOf<T>(
    Func<int, T> indexer,
    Func<T, boo> predicate,
    int count)
{
    var low = 0;
    var high = count - 1;

    while (low < (high - 1))
    {
        var mid = low + ((high - low) / 2);

        if (predicate(indexer(mid))
        {
            high = mid;
        }
        else
        {
            low = mid;
        }
    }

    if (predicate(indexer(low)))
    {
        return low;
    }

    if (low != high && predicate(indexer(high)))
    {
        return high;
    }

    return -1;
}

Which I would use like this,

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var first = BinaryFirstIndexOf(
     i => c[i], 
     e => e.TimeGenerated > time,
     c.Count);

if (first >= 0)
{
    var result = new List<Item>(c.Count - first);
    Enumerable.Range(first, c.Count - first).AsParallel()
    .ForAll(i => 
        {
            var j = i - first;
            result[j] = (Item)c[i];
        });
}

Don't you want to do something like this

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var cutOff = 0;
for (var i = c.Count - 1; i > - 1; i--)
{
    if (c[i].TimeGenerated < time)
    {
        cutOff = i;
        break;
    }
}

var result = new List<Item>(c.Count - cutOff - 1);
var j = 0;
for (var i = cutOff + 1; i < c.Count; i ++)
{
    result[j] = (Item)c[i];
    j++;
}

I'm assuming the data you want is on the end and takes up less that half the collection.

Or perhaps using linq to do the casting in parallel,

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var cutOff = 0;
for (var i = c.Count - 1; i > - 1; i--)
{
    if (c[i].TimeGenerated < time)
    {
        cutOff = i;
        break;
    }
}

var result = new List<Item>(c.Count - cutOff - 1);
Enumerable.Range(cutOff + 1, c.Count - cutOff - 1).AsParallel()
    .ForAll(i => 
        {
            var j = i - cutOff - 1;
            result[j] = (Item)c[i];
        });
Community
  • 1
  • 1
Jodrell
  • 34,946
  • 5
  • 87
  • 124
0

This first statement:

List<Item> l1 = new List<Item>();
foreach (Item i in c) { 
  if (i.time > TIME) { 
    l1.Add(i); }
}

Does it improve performance changing to (will filter the list before the foreach is run):

List<Item> l1 = new List<Item>();
foreach (Item i in c.Where(a => a.time > TIME)) { 
    l1.Add(i);
}
Chris Dixon
  • 9,147
  • 5
  • 36
  • 68
  • Yes, but only marginally. Down to 40 seconds. I can't actually call the .Where until I cast it. c.Cast.Where(...) – user1002479 Oct 01 '12 at 13:27