7

I am using LINQ-to-Entities, and would like to perform a pivot.

For exampe, I have this table:

| data1 | data2 |
+-------+-------+
|   1   |   A   |
|   1   |   B   |
|   2   |   P   |
|   2   |   Q   |
|   2   |   R   |
+---------------+

And I want to pivot it into the following results:

| data1 | first | second | third |
+-------+-------+--------+-------+
|   1   |   A   |   B    | NULL  |
|   2   |   P   |   Q    |   R   |
+--------------------------------+

I would like to do this in LINQ, without needing to do client-side processing.

I have seen these SO posts, but they do not quite address the above situation (as far as I can tell).



Note I have tried the below, but it complains that I cannot use Skip() on an unordered collection, and I don't see a way to get the group's collapsed 'data2' info sorted.

from item in MyTable
group item by item.data1 into g
select new
{
    data1 = g.Key,
    first = g.Skip(0).FirstOrDefault().data2,
    second = g.Skip(1).FirstOrDefault().data2,
    third = g.Skip(2).FirstOrDefault().data2,
};
Community
  • 1
  • 1
jwd
  • 10,837
  • 3
  • 43
  • 67
  • The second one is a closer duplicate of this: [is-it-possible-to-pivot-data-using-linq](http://stackoverflow.com/questions/167304/is-it-possible-to-pivot-data-using-linq) – nawfal Oct 18 '13 at 06:21

2 Answers2

0

Hmm, this seems to work, though I wonder how efficient it is.

from item in MyTable
group item by item.data1 into g
select new
{
    data1 = g.Key,
    first = g.OrderBy(x => x.data2).Skip(0).FirstOrDefault().data2,
    second = g.OrderBy(x => x.data2).Skip(1).FirstOrDefault().data2,
    third = g.OrderBy(x => x.data2).Skip(2).FirstOrDefault().data2,
};

The corresponding SQL generated (from LINQPad) is:

SELECT [t1].[data1], (
    SELECT [t5].[data2]
    FROM (
        SELECT TOP (1) [t4].[data2]
        FROM (
            SELECT [t3].[data2], [t3].[ROW_NUMBER]
            FROM (
                SELECT ROW_NUMBER() OVER (ORDER BY [t2].[data2]) AS [ROW_NUMBER], [t2].[data2]
                FROM [MyTable] AS [t2]
                WHERE [t1].[data1] = [t2].[data1]
                ) AS [t3]
            WHERE [t3].[ROW_NUMBER] > @p0
            ) AS [t4]
        ORDER BY [t4].[ROW_NUMBER]
        ) AS [t5]
    ) AS [first], (
    SELECT [t10].[data2]
    FROM (
        SELECT TOP (1) [t9].[data2]
        FROM (
            SELECT [t8].[data2], [t8].[ROW_NUMBER]
            FROM (
                SELECT ROW_NUMBER() OVER (ORDER BY [t7].[data2]) AS [ROW_NUMBER], [t7].[data2]
                FROM (
                    SELECT [t6].[data2]
                    FROM [MyTable] AS [t6]
                    WHERE [t1].[data1] = [t6].[data1]
                    ) AS [t7]
                ) AS [t8]
            WHERE [t8].[ROW_NUMBER] > @p1
            ) AS [t9]
        ORDER BY [t9].[ROW_NUMBER]
        ) AS [t10]
    ) AS [second], (
    SELECT [t15].[data2]
    FROM (
        SELECT TOP (1) [t14].[data2]
        FROM (
            SELECT [t13].[data2], [t13].[ROW_NUMBER]
            FROM (
                SELECT ROW_NUMBER() OVER (ORDER BY [t12].[data2]) AS [ROW_NUMBER], [t12].[data2]
                FROM (
                    SELECT [t11].[data2]
                    FROM [MyTable] AS [t11]
                    WHERE [t1].[data1] = [t11].[data1]
                    ) AS [t12]
                ) AS [t13]
            WHERE [t13].[ROW_NUMBER] > @p2
            ) AS [t14]
        ORDER BY [t14].[ROW_NUMBER]
        ) AS [t15]
    ) AS [third]
FROM (
    SELECT [t0].[data1]
    FROM [MyTable] AS [t0]
    GROUP BY [t0].[data1]
    ) AS [t1]
jwd
  • 10,837
  • 3
  • 43
  • 67
  • 1
    Wouldn't `FirstOrDefault().data2` throw a NullReferenceException if it returns default? – Bala R Sep 15 '11 at 01:34
  • @Bala: Interesting point. Yet it does seem to work. I just tried it out on the exact table I described (I made a test table), and in fact it results in 'third' being `null`. No exception thrown. I chalk it up to LINQ behind-the-scenes magick. – jwd Sep 15 '11 at 01:41
  • you should post another question about it :p – Bala R Sep 15 '11 at 01:45
  • 1
    @jwd - this isn't very efficient - it's calling the database 3 x n + 1 where n is the number of distinct `data1` values. You'd be much better off bringing the data into memory either before or after grouping, but definitely before doing the select. – Enigmativity Sep 16 '11 at 03:23
  • @Enigmativity: What makes you say it hits the database multiple times? I just tried this in LINQPad, and it only generated one (somewhat convoluted) SQL query. Granted, my real code is more complicated than the example in my question, but is there something in particular about this style of query that makes you say it will hit the DB multiple times? – jwd Sep 16 '11 at 16:37
  • @Enigmativity: I added the generated SQL, for clarity. – jwd Sep 16 '11 at 16:39
  • @jwd - I did the same thing with LINQPad and got multiple queries. I did query against a SQLite database so I suspect that different providers have different query strategies. Nevertheless it does produce a convoluted query that has repeated sections. I would still bring this in to memory - and only the data needed - before the final select. – Enigmativity Sep 16 '11 at 23:48
0

I assume that you could have more than three columns from the data2 field?

If so there's no way to do you query that returns an anonymous type with a variable number of properties. You need to return an array or some sort of list for the data2 set of values.

I think this is the kind of thing that you can do:

var query = 
    from mt in MyTable
    group mt.data2 by mt.data1 into gmts
    let d2 = gmts.ToArray()
    select new
    {
        data1 = gmts.Key,
        data2 = d2,
        length = d2.Length,
    };

var pending = query.ToArray();

var maxLength = pending.Max(p => p.length);

Func<string[], string[]> extend = xs =>
{
    var r = new string[maxLength];
    xs.CopyTo(r, 0);
    return r;
};

var results =
    from p in pending
    select new
    {
        p.data1,
        data2 = extend(p.data2),
    };

This produces a series of anonymous type with the data2 array all being the same size to fit the maximum number of results for any of the data1 fields.

The query is still executed as a single SQL query. And the in-memory processing is fast.

Does this work for you?


EDIT

Since you know you have a fixed number of columns (as per comment) you can easily change my results query to meet your requirements:

var results =
    from p in pending
    let d2s = extend(p.data2)
    select new
    {
        p.data1,
        first = d2s[0],
        second = d2s[1],
        third = d2s[2],
    };
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • I havnen't tried your code, but I definitely want to avoid a `ToArray()` on the client side. In point of fact, I do only care about a fixed number of columns, I don't need the super-generic case. – jwd Sep 15 '11 at 16:59
  • @jwd - why do want to avoid `ToArray()`? – Enigmativity Sep 16 '11 at 02:54
  • @jwd - I've edited my solution to include a query that returns fixed columns. – Enigmativity Sep 16 '11 at 03:01
  • I want to avoid `ToArray()` because that would pull all those rows from the DB into memory. My question is simplified, but this is part of a larger query, and the pivot happens somewhere in the middle of the whole procedure. – jwd Sep 16 '11 at 16:30