Linq query to join against list in a struct

Question

I have a dictionary of struct, where one member is a list containing varying elements applicable to each dictionary item.

I would like to join these elements against each item, in order to filter them and/or group them by element.

In SQL I'm familiar with joining against tables/queries to obtain multiple rows as desired, but I'm new to C#/Linq. Since a "column" can be an object/list already associated with the proper dictionary items, I wonder how I can use them to perform a join?

Here's a sample of the structure:

name   elements
item1  list: elementA
item2  list: elementA, elementB

I would like a query that gives this output (count = 3)

name   elements
item1  elementA
item2  elementA
item2  elementB

For ultimately, grouping them like this:

   element    count
   ElementA   2
   ElementB   1

Here's my code start to count dictionary items.

    public struct MyStruct
    {
        public string name;
        public List<string> elements;
    }

    private void button1_Click(object sender, EventArgs e)
    {
        MyStruct myStruct = new MyStruct();
        Dictionary<String, MyStruct> dict = new Dictionary<string, MyStruct>();

        // Populate 2 items
        myStruct.name = "item1";
        myStruct.elements = new List<string>();
        myStruct.elements.Add("elementA");
        dict.Add(myStruct.name, myStruct);

        myStruct.name = "item2";
        myStruct.elements = new List<string>();
        myStruct.elements.Add("elementA");
        myStruct.elements.Add("elementB");
        dict.Add(myStruct.name, myStruct);


        var q = from t in dict
                select t;

        MessageBox.Show(q.Count().ToString()); // Returns 2
    }

Edit: I don't really need the output is a dictionary. I used it to store my data because it works well and prevents duplicates (I do have unique item.name which I store as the key). However, for the purpose of filtering/grouping, I guess it could be a list or array without issues. I can always do .ToDictionary where key = item.Name afterwards.

@sinanakyazici the question does not specify that the output must be stored in a dictionary (and, indeed, as you correctly note, out cannot be). — phoog, Feb 29 '12 at 06:37
@phoog you are right. I misunterstand. So I deleted my comment. — Sinan AKYAZICI, Feb 29 '12 at 06:38
@sinanakyazici for some reason I can't delete (nor edit) my comment on my phone's browser :( — phoog, Feb 29 '12 at 06:41

score 3 · Accepted Answer · edited May 23 '17 at 11:48

var q = from t in dict
    from v in t.Value.elements
    select new { name = t.Key, element = v };

The method here is Enumerable.SelectMany. Using extension method syntax:

var q = dict.SelectMany(t => t.Value.elements.Select(v => new { name = t.Key, element = v }));

EDIT

Note that you could also use t.Value.name above, instead of t.Key, since these values are equal.

So, what's going on here?

The query-comprehension syntax is probably easiest to understand; you can write an equivalent iterator block to see what's going on. We can't do that simply with an anonymous type, however, so we'll declare a type to return:

class NameElement
{
    public string name { get; set; }
    public string element { get; set; }
}
IEnumerable<NameElement> GetResults(Dictionary<string, MyStruct> dict)
{
    foreach (KeyValuePair<string, MyStruct> t in dict)
        foreach (string v in t.Value.elements)
            yield return new NameElement { name = t.Key, element = v };
}

How about the extension method syntax (or, what's really going on here)?

(This is inspired in part by Eric Lippert's post at https://stackoverflow.com/a/2704795/385844; I had a much more complicated explanation, then I read that, and came up with this:)

Let's say we want to avoid declaring the NameElement type. We could use an anonymous type by passing in a function. We'd change the call from this:

var q = GetResults(dict);

to this:

var q = GetResults(dict, (string1, string2) => new { name = string1, element = string2 });

The lambda expression (string1, string2) => new { name = string1, element = string2 } represents a function that takes 2 strings -- defined by the argument list (string1, string2) -- and returns an instance of the anonymous type initialized with those strings -- defined by the expression new { name = string1, element = string2 }.

The corresponding implementation is this:

IEnumerable<T> GetResults<T>(
    IEnumerable<KeyValuePair<string, MyStruct>> pairs,
    Func<string, string, T> resultSelector)
{
    foreach (KeyValuePair<string, MyStruct> pair in pairs)
        foreach (string e in pair.Value.elements)
            yield return resultSelector.Invoke(t.Key, v);
}

Type inference allows us to call this function without specifying T by name. That's handy, because (as far as we are aware as C# programmers), the type we're using doesn't have a name: it's anonymous.

Note that the variable t is now pair, to avoid confusion with the type parameter T, and v is now e, for "element". We've also changed the type of the first parameter to one of its base types, IEnumerable<KeyValuePair<string, MyStruct>>. It's wordier, but it makes the method more useful, and it will be helpful in the end. As the type is no longer a dictionary type, we've also changed the name of the parameter from dict to pairs.

We could generalize this further. The second foreach has the effect of projecting a key-value pair to a sequence of type T. That whole effect could be encapsulated in a single function; the delegate type would be Func<KeyValuePair<string, MyStruct>, T>. The first step is to refactor the method so we have a single statement that converts the element pair into a sequence, using the Select method to invoke the resultSelector delegate:

IEnumerable<T> GetResults<T>(
    IEnumerable<KeyValuePair<string, MyStruct>> pairs,
    Func<string, string, T> resultSelector)
{
    foreach (KeyValuePair<string, MyStruct> pair in pairs)
        foreach (T result in pair.Value.elements.Select(e => resultSelector.Invoke(pair.Key, e))
            yield return result;
}

Now we can easily change the signature:

IEnumerable<T> GetResults<T>(
    IEnumerable<KeyValuePair<string, MyStruct>> pairs,
    Func<KeyValuePair<string, MyStruct>, IEnumerable<T>> resultSelector)
{
    foreach (KeyValuePair<string, MyStruct> pair in pairs)
        foreach (T result in resultSelector.Invoke(pair))
            yield return result;
}

The call site now looks like this; notice how the lambda expression now incorporates the logic that we removed from the method body when we changed its signature:

var q = GetResults(dict, pair => pair.Value.elements.Select(e => new { name = pair.Key, element = e }));

To make the method more useful (and its implementation less verbose), let's replace the type KeyValuePair<string, MyStruct> with a type parameter, TSource. We'll change some other names at the same time:

T     -> TResult
pairs -> sourceSequence
pair  -> sourceElement

And, just for kicks, we'll make it an extension method:

static IEnumerable<TResult> GetResults<TSource, TResult>(
    this IEnumerable<TSource> sourceSequence,
    Func<TSource, IEnumerable<TResult>> resultSelector)
{
    foreach (TSource sourceElement in sourceSequence)
        foreach (T result in resultSelector.Invoke(pair))
            yield return result;
}

And there you have it: SelectMany! Well, the function still has the wrong name, and the actual implementation includes validation that the source sequence and the selector function are non-null, but that's the core logic.

From MSDN: SelectMany "projects each element of a sequence to an IEnumerable and flattens the resulting sequences into one sequence."

With your first answer, I get An expression of type 'MyStruct' is not allowed in a subsequent from clause in a query expression with source type 'Dictionary'. Type inference failed in the call to 'SelectMany'. — mtone, Feb 29 '12 at 06:50
And the second, I get 'MyStruct' does not contain a definition for 'Select' and no extension method 'Select' accepting a first argument of type 'MyStruct' could be found — mtone, Feb 29 '12 at 06:51
wow thanks, it works just fine! If you have the time, can you explain briefly what is happening here, or point me to articles/documentation that would help me understand? — mtone, Feb 29 '12 at 07:06
@mtone I've added an explanation. Please let me know which parts of it are unclear, and I'll try to clarify them. Thanks! — phoog, Mar 01 '12 at 01:24

Despertar · Answer 2 · 2012-02-29T07:16:03.907

1

This flattens the arrays into a single array then counts unique values.

var groups = dictionary
    .SelectMany(o => o.Value)
    .GroupBy(o => o);

foreach (var g in groups)
    Console.WriteLine(g.Key + ": " + g.Count());

Using the following dictionary:

Dictionary<string, string[]> dictionary = new Dictionary<string, string[]>();
dictionary.Add("One", new string[] { "A" });
dictionary.Add("Two", new string[] {"A", "B" });
dictionary.Add("Three", new string[] { "A", "B" });

I get this output:

 A: 3
 B: 2

edited Feb 29 '12 at 07:16

answered Feb 29 '12 at 06:27

Despertar

21,627
11
81
79

You can't create the second dictionary because its keys wouldn't be unique. – phoog Feb 29 '12 at 06:35
Thank you for pointing that out, I have updated my answer to solve the long-term goal. – Despertar Feb 29 '12 at 07:05
Thank you, this indeed provides the proper grouped count. For now, I think I prefer doing it in 2 steps (expand, then group), but I'll certainly keep this in mind. Thanks again! – mtone Feb 29 '12 at 07:13
1

You're welcome. LINQ does take a little bit to get used to but the more you use it the more you'll like it and it will just feel natural. And if you like LINQ check out Rx (Reactive Extensions). Instead of IEnumerable lists you have a stream of data (implementing IObservable) you can query with LINQ and subscribe actions to be performed on each item as it comes in, pretty cool stuff. http://msdn.microsoft.com/en-us/data/gg577609 – Despertar Feb 29 '12 at 07:25

score 1 · Answer 3 · answered Feb 29 '12 at 06:32

1

/* Will return 
name   elements
item1  elementA
item2  elementA
item2  elementB 
*/
var res = dict
    .Values
    .SelectMany(m => m.elements.Select(e => new {m.name, element= e}))
    .ToArray();

/* Will return 
element    count
ElementA   2
ElementB   1 
*/
var res2 = res
    .GroupBy(r => r.element)
    .Select(g => new {element = g.Key, count = g.Count()})
    .ToArray();

answered Feb 29 '12 at 06:32

SynXsiS

1,860
10
12

Thanks a lot! This works, and is very readable. I'll certainly work with it further. – mtone Feb 29 '12 at 07:30

score 0 · Answer 4 · answered Feb 29 '12 at 06:26

0

What if you use another dictionary for that.

Dictionary<String, string> dict2 = new Dictionary<string, string>();

 dict.foreach(item => item.elements.foreach(elem => dict2.Add(elem,item.name)));

then you can query the new Dictionary to get the count, it has element as the key therefore for each element it has the items which had it. Thus you can find how many items had the element you want

answered Feb 29 '12 at 06:26

Jayanga

889
6
15

That won't work because the keys in the second dictionary wouldn't be unique. – phoog Feb 29 '12 at 06:34

kaj · Answer 5 · 2012-02-29T09:09:29.623

0

You may want to start from a simpler collection of structs but from your dictionary:

var q = from t in dict.Values  
            from el in t.Elements  
            group el by el into eNameGroup  
            select new { Name = eNameGroup.Key, Count = eNameGroup.Count() };

This returns:

Name Count
ElementA 2
ElementB 1

edited Feb 29 '12 at 09:09

answered Feb 29 '12 at 06:53

kaj

5,133
2
21
18

score 0 · Answer 6 · answered Mar 13 '12 at 05:11

If what you're after is grouping/pivoting, this could be done more declaratively by leveraging LINQ's grouping and avoiding dictionaries altogether:

void Main()
{
    var items = new MyStruct[] { 
        new MyStruct { name = "item1", elements = new List<string> { "elementA" }},
        new MyStruct { name = "item2", elements = new List<string> { "elementA", "elementB" }}};

    var groupedByElement =
        from item in items
        from element in item.elements
        group item by element;

    groupedByElement.Dump(); // items grouped by element value, (pivoted)

    var elementsWithCount =
        from gj in groupedByElement
        select new { element = gj.Key, count = gj.Count() };

    elementsWithCount.Dump();
    // element, count
    // elementA, 2
    // elementB, 1
}

public struct MyStruct
{
    public string name;
    public List<string> elements;
}

BTW, this answer was written in LINQPad. The Dump calls are LINQPad's way of displaying the output. — devgeezer, Mar 13 '12 at 13:44

Linq query to join against list in a struct

6 Answers6