5

I've got an ILookup generated by some complicated expression. Let's say it's a lookup of people by last name. (In our simplistic world model, last names are unique by family)

ILookup<string, Person> families;

Now I've got two queries I'm interested in how to build.

First, how would I filter by last name?

var germanFamilies = families.Where(family => IsNameGerman(family.Key));

But here, germanFamilies is an IEnumerable<IGrouping<string, Person>>; if I call ToLookup() on it, I'd best bet would get an IGrouping<string, IGrouping<string, Person>>. If I try to be smart and call SelectMany first I'd end up with the computer doing a lot of unnecessary work. How would you convert this enumeration into a lookup easily?

Second, I'd like to get a lookups of adults only.

var adults = families.Select(family =>
         new Grouping(family.Key, family.Select(person =>
               person.IsAdult())));

Here I'm faced with two problems: the Grouping type doesn't exist (except as an internal inner class of Lookup), and even if it did we'd have the problem discussed above.

So, apart from implementing the ILookup and IGrouping interfaces completely, or make the computer do silly amounts of work (regrouping what has already been grouped), is there a way to alter existing ILookups to generate new ones that I missed?

configurator
  • 40,828
  • 14
  • 81
  • 115

3 Answers3

5

(I'm going to assume you actually wanted to filter by last name, given your query.)

You can't modify any implementation of ILookup<T> that I'm aware of. It's certainly possible to implement ToLookup with an immutable lookup, as you're clearly aware :)

What you could do, however, is to change to use a Dictionary<string, List<Person>>:

var germanFamilies = families.Where(family => IsNameGerman(family.Key))
                             .ToDictionary(family => family.Key,
                                           family.ToList());

That approach also works for your second query:

var adults = families.ToDictionary(family => family.Key,
                                   family.Where(person => persion.IsAdult)
                                         .ToList());

While that's still doing a bit more work than we might think necessary, it's not too bad.

EDIT: The discussion with Ani in the comments is worth reading. Basically, we're already going to be iterating over every person anyway - so if we assume O(1) dictionary lookup and insertion, we're actually no better in terms of time-complexity using the existing lookup than flattening:

var adults = families.SelectMany(x => x)
                     .Where(person => person.IsAdult)
                     .ToLookup(x => x.LastName);

In the first case, we could potentially use the existing grouping, like this:

// We'll have an IDictionary<string, IGrouping<string, Person>>
var germanFamilies = families.Where(family => IsNameGerman(family.Key))
                             .ToDictionary(family => family.Key);

That is then potentially much more efficient (if we have many people in each family) but means we're using groupings "out of context". I believe that's actually okay, but it leaves a slightly odd taste in my mouth, for some reason. As ToLookup materializes the query, it's hard to see how it could actually go wrong though...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I was about to post something similar, but thinking about it, I don't really believe this much more efficient than a naïve re-ToLookup, at least in a time-complexity sense. Given a person, you should be able to push him / her into the right bucket in constant time. So the flattening of the original lookup doesn't really cost much. The bulk of the work in both cases should be the enumeration through *every* single person, the allocation, dynamic buffer resizing, copying etc. – Ani Jan 10 '11 at 20:07
  • @Ani: I suspect it's still significantly more efficient. For example, I believe that the MS implementation of IGrouping also implements `IList`, so the `ToList` call can use a bulk operation (in the first case, but not the second). Why would you want to perform a hash lookup for every item again, instead of just once per family? It may not be any better in terms of *complexity*, but that doesn't mean it's not worth doing. Those constant factors can be important :) – Jon Skeet Jan 10 '11 at 20:10
  • 1
    Fair enough, but I wanted to warn the OP about micro-optimizing. In this case, you're essentially suggesting that he abandon what he (quite reasonably) sees as the semantically ideal representation of his data for something else, in the interests of performance-gains that *may not be* very significant. Not that I personally see a problem with this approach or with using a `Dictionary>` instead of a lookup. The `ILookup` interface is not very useful at the moment. :) – Ani Jan 10 '11 at 20:20
  • @Ani: I missed something in the first query - we don't need to call ToList at all; we can keep the grouping. – Jon Skeet Jan 10 '11 at 20:26
  • @Jon Skeet, @Ani: what do you think about my answer ? I think I'm missing something... – digEmAll Jan 10 '11 at 20:50
  • My solution to the actual problem was actually not generating the original lookup in the first place but injecting the Where clause in there (which resulted in some duplicate code but what I think is saner filtering as it's done by the SQL server). Your solution of returning an `IDictionary>` makes me think there's a big, gaping hole in the framework when ILookup is involved... – configurator Jan 10 '11 at 22:11
  • @Jon Although I have to admin, the fact that the interface is defined in System.Core and not mscorlib should have been a big hint that ILookup is not actually supported by anything, but just there for .ToLookup to make sense... Does anything other than ToLookup reference ILookup? – configurator Jan 10 '11 at 22:14
  • @configurator: Not that I'm aware of. On the other hand, it's a nicely read-only interface for a change... – Jon Skeet Jan 10 '11 at 23:09
  • @GoldenLion: No. This question is *over 11 years old*. If you want to experiment with lookups, you should be able to easily find a data set that's interesting for you. You shouldn't expect Stack Overflow answers to be complete tutorials/code-labs/all-in-one resources to learn *everything* about a topic. – Jon Skeet Nov 15 '22 at 06:57
2

For your first query, what about implementing your own FilteredLookup able to take advantage of coming from another ILookup ?
(thank to Jon Skeet for the hint)

public static ILookup<TKey, TElement> ToFilteredLookup<TKey, TElement>(this ILookup<TKey, TElement> lookup, Func<IGrouping<TKey, TElement>, bool> filter)
{
    return new FilteredLookup<TKey, TElement>(lookup, filter);
}

With FilteredLookup class being:

internal sealed class FilteredLookup<TKey, TElement> : ILookup<TKey, TElement>
{
    int count = -1;
    Func<IGrouping<TKey, TElement>, bool> filter;
    ILookup<TKey, TElement> lookup;

    public FilteredLookup(ILookup<TKey, TElement> lookup, Func<IGrouping<TKey, TElement>, bool> filter)
    {
        this.filter = filter;
        this.lookup = lookup;
    }

    public bool Contains(TKey key)
    {
        if (this.lookup.Contains(key))
            return this.filter(this.GetGrouping(key));
        return false;
    }

    public int Count
    {
        get
        {
            if (count >= 0)
                return count;
            count = this.lookup.Where(filter).Count();
            return count;
        }
    }

    public IEnumerable<TElement> this[TKey key]
    {
        get
        {
            var grp = this.GetGrouping(key);
            if (!filter(grp))
                throw new KeyNotFoundException();
            return grp;
        }
    }

    public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator()
    {
        return this.lookup.Where(filter).GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    private IGrouping<TKey, TElement> GetGrouping(TKey key)
    {
        return new Grouping<TKey, TElement>(key, this.lookup[key]);
    }
}

and Grouping:

internal sealed class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
    private readonly TKey key;
    private readonly IEnumerable<TElement> elements;

    internal Grouping(TKey key, IEnumerable<TElement> elements)
    {
        this.key = key;
        this.elements = elements;
    }

    public TKey Key { get { return key; } }

    public IEnumerator<TElement> GetEnumerator()
    {
        return elements.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

So basically your first query will be:

var germanFamilies = families.ToFilteredLookup(family => IsNameGerman(family.Key));

This allows you to avoid re-flattening-filtering-ToLookup, or creating a new dictionary (and so hashing keys again).

For the second query the idea will be similar, you should just create a similar class not filtering for the whole IGrouping but for the elements of the IGrouping.

Just an idea, maybe it could not be faster than other methods :)

digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • Nice. This is a good way to deal with the first query, especially since the filter is cheap. A similar technique for the second query would have to be a lot more clever in terms of memoization. – Ani Jan 11 '11 at 00:52
  • I haven't looked at it fully, but it looks good. One question: can't you memoize the list of filtered groups instead of just the filtered count? This could be reused by `Count` as well as by `GetEnumerator`. If you memoized it to a `Dictionary>` (or similar), it could even be shared by `Contains`. Of course, you should then give up holding onto a reference to the original lookup in the interests of space. This gets real complicated real fast. :) – Ani Jan 11 '11 at 00:53
  • Essentially, what I'm suggesting is a strategy that moves to Jon Skeet's answer and *stays there* the moment you *know* you have to run the filter on everything. – Ani Jan 11 '11 at 01:01
  • @Ani: yes memoization could be a good idea, a bit complicated maybe :) – digEmAll Jan 11 '11 at 07:54
0

The Lookup creates an index with a Key type and a value type generic indexer. You can added to a lookup and remove from a lookup by using concat for add and iterate and removing the key items in a temp list then rebuilding the lookup. The look up then works like a dictionary by retrieving the value type by a key.

public async Task TestILookup()
{
    // Lookup<TKey,TElement>
    List<Product> products = new List<Product>
        {
            new Product { ProductID = 1, Name = "Kayak", Category = "Watersports", Price = 275m },
            new Product { ProductID = 2, Name = "Lifejacket", Category = "Watersports", Price = 48.95m },
            new Product { ProductID = 3, Name = "Soccer Ball", Category = "Soccer", Price = 19.50m },
            new Product { ProductID = 4, Name = "Corner Flag", Category = "Soccer", Price = 34.95m }
         };

    // create an indexer
    ILookup<int, Product> lookup = (Lookup<int,Product>) products.ToLookup(p => p.ProductID, p => p);

    Product newProduct = new Product { ProductID = 5, Name = "Basketball", Category = "Basketball", Price = 120.15m };

    lookup = lookup.SelectMany(l => l)
                   .Concat(new[] { newProduct })
                   .ToLookup(l => l.ProductID, l=>l);

    foreach (IGrouping<int, Product> packageGroup in lookup)
    {
        // Print the key value of the IGrouping.
        output.WriteLine("ProductID Key {0}",packageGroup.Key);

        // Iterate over each value in the IGrouping and print its value.
        foreach (Product product in packageGroup)
            output.WriteLine("Name {0}", product.Name);
    }

    Assert.Equal(lookup.Count(), 5);
}

public class Product
{
    public int ProductID { get; set; }
    public string Name { get; set; }
    public string Category { get; set; }
    public decimal Price { get; set; }
}

Output:

ProductID Key 1
Name Kayak
ProductID Key 2
Name Lifejacket
ProductID Key 3
Name Soccer Ball
ProductID Key 4
Name Corner Flag
ProductID Key 5
Name Basketball
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Golden Lion
  • 3,840
  • 2
  • 26
  • 35