186

I am wondering if it is possible to use LINQ to pivot data from the following layout:

CustID | OrderDate | Qty
1      | 1/1/2008  | 100
2      | 1/2/2008  | 200
1      | 2/2/2008  | 350
2      | 2/28/2008 | 221
1      | 3/12/2008 | 250
2      | 3/15/2008 | 2150

into something like this:

CustID  | Jan- 2008 | Feb- 2008 | Mar - 2008 |
1       | 100       | 350       |  250
2       | 200       | 221       | 2150
ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
Tim Lentine
  • 7,782
  • 5
  • 35
  • 40

7 Answers7

212

Something like this?

List<CustData> myList = GetCustData();

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => new {
        CustId = g.Key,
        Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
        Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
        March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty)
    });

GroupBy in Linq does not work the same as SQL. In SQL, you get the key and aggregates (row/column shape). In Linq, you get the key and any elements as children of the key (hierarchical shape). To pivot, you must project the hierarchy back into a row/column form of your choosing.

nawfal
  • 70,104
  • 56
  • 326
  • 368
Amy B
  • 108,202
  • 21
  • 135
  • 185
  • Does the list have to be an IEnumerable before you can apply the pivot? Or can this also be done on an IQueryable from EF (without having to materialize the list in memory)? – Rob Vermeulen Feb 27 '20 at 08:58
  • @RobVermeulen I could translate that query into sql, so I would expect EF to be able to translate it as well. Give it a try I guess? – Amy B Feb 27 '20 at 20:21
  • I tested it, and it sort of works. Though SQL Profiler shows that EF will not translated it into a (fast) pivot query but a couple of slower sub queries. – Rob Vermeulen Mar 06 '20 at 10:31
  • Thanks for this answer. I wanted to post some LinqPad code so people could so this work so I "answered" the question below. I don't know how to reference this answer though. – Grant Johnson Dec 16 '21 at 21:25
  • Could I ask What I can do If the coulmns is dynamic not static like (12)months, I mean if the case is variable course names) – Anyname Donotcare Apr 12 '22 at 11:59
14

I answered similar question using linq extension method:

// order s(ource) by OrderDate to have proper column ordering
var r = s.Pivot3(e => e.custID, e => e.OrderDate.ToString("MMM-yyyy")
    , lst => lst.Sum(e => e.Qty));
// order r(esult) by CustID

(+) generic implementation
(-) definitely slower than Amy B's

Can anyone improve my implementation (i.e. the method does the ordering of columns & rows)?

Amy B
  • 108,202
  • 21
  • 135
  • 185
Sanjaya.Tio
  • 391
  • 4
  • 5
8

The neatest approach for this, I think, is to use a lookup:

var query =
    from c in myList
    group c by c.CustId into gcs
    let lookup = gcs.ToLookup(y => y.OrderDate.Month, y => y.Qty)
    select new
    {
        CustId = gcs.Key,
        Jan = lookup[1].Sum(),
        Feb = lookup[2].Sum(),
        Mar = lookup[3].Sum(),
    };
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
3

Here is a bit more generic way how to pivot data using LINQ:

IEnumerable<CustData> s;
var groupedData = s.ToLookup( 
        k => new ValueKey(
            k.CustID, // 1st dimension
            String.Format("{0}-{1}", k.OrderDate.Month, k.OrderDate.Year // 2nd dimension
        ) ) );
var rowKeys = groupedData.Select(g => (int)g.Key.DimKeys[0]).Distinct().OrderBy(k=>k);
var columnKeys = groupedData.Select(g => (string)g.Key.DimKeys[1]).Distinct().OrderBy(k=>k);
foreach (var row in rowKeys) {
    Console.Write("CustID {0}: ", row);
    foreach (var column in columnKeys) {
        Console.Write("{0:####} ", groupedData[new ValueKey(row,column)].Sum(r=>r.Qty) );
    }
    Console.WriteLine();
}

where ValueKey is a special class that represents multidimensional key:

public sealed class ValueKey {
    public readonly object[] DimKeys;
    public ValueKey(params object[] dimKeys) {
        DimKeys = dimKeys;
    }
    public override int GetHashCode() {
        if (DimKeys==null) return 0;
        int hashCode = DimKeys.Length;
        for (int i = 0; i < DimKeys.Length; i++) { 
            hashCode ^= DimKeys[i].GetHashCode();
        }
        return hashCode;
    }
    public override bool Equals(object obj) {
        if ( obj==null || !(obj is ValueKey))
            return false;
        var x = DimKeys;
        var y = ((ValueKey)obj).DimKeys;
        if (ReferenceEquals(x,y))
            return true;
        if (x.Length!=y.Length)
            return false;
        for (int i = 0; i < x.Length; i++) {
            if (!x[i].Equals(y[i]))
                return false;
        }
        return true;            
    }
}

This approach can be used for grouping by N-dimensions (n>2) and will work fine for rather small datasets. For large datasets (up to 1 mln of records and more) or for cases when pivot configuration cannot be hardcoded I've written special PivotData library (it is free):

var pvtData = new PivotData(new []{"CustID","OrderDate"}, new SumAggregatorFactory("Qty"));
pvtData.ProcessData(s, (o, f) => {
    var custData = (TT)o;
    switch (f) {
        case "CustID": return custData.CustID;
        case "OrderDate": 
        return String.Format("{0}-{1}", custData.OrderDate.Month, custData.OrderDate.Year);
        case "Qty": return custData.Qty;
    }
    return null;
} );
Console.WriteLine( pvtData[1, "1-2008"].Value );  
Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34
2
// LINQPad Code for Amy B answer
void Main()
{
    List<CustData> myList = GetCustData();
    
    var query = myList
        .GroupBy(c => c.CustId)
        .Select(g => new
        {
            CustId = g.Key,
            Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
            Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
            March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty),
            //April = g.Where(c => c.OrderDate.Month == 4).Sum(c => c.Qty),
            //May = g.Where(c => c.OrderDate.Month == 5).Sum(c => c.Qty),
            //June = g.Where(c => c.OrderDate.Month == 6).Sum(c => c.Qty),
            //July = g.Where(c => c.OrderDate.Month == 7).Sum(c => c.Qty),
            //August = g.Where(c => c.OrderDate.Month == 8).Sum(c => c.Qty),
            //September = g.Where(c => c.OrderDate.Month == 9).Sum(c => c.Qty),
            //October = g.Where(c => c.OrderDate.Month == 10).Sum(c => c.Qty),
            //November = g.Where(c => c.OrderDate.Month == 11).Sum(c => c.Qty),
            //December = g.Where(c => c.OrderDate.Month == 12).Sum(c => c.Qty)          
        });
        
    
    query.Dump();
}

/// <summary>
/// --------------------------------
/// CustID  | OrderDate     | Qty
/// --------------------------------
/// 1       | 1 / 1 / 2008  | 100
/// 2       | 1 / 2 / 2008  | 200
/// 1       | 2 / 2 / 2008  | 350
/// 2       | 2 / 28 / 2008 | 221
/// 1       | 3 / 12 / 2008 | 250
/// 2       | 3 / 15 / 2008 | 2150 
/// </ summary>
public List<CustData> GetCustData()
{
    List<CustData> custData = new List<CustData>
    {
        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 1, 1),
            Qty = 100
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 1, 2),
            Qty = 200
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 2, 2),
            Qty = 350
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 2, 28),
            Qty = 221
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 3, 12),
            Qty = 250
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 3, 15),
            Qty = 2150
        },      
    };

    return custData;
}

public class CustData
{
    public int CustId;
    public DateTime OrderDate;
    public uint Qty;
}

enter image description here

Grant Johnson
  • 317
  • 3
  • 8
0

This is most efficient way:

Check the following approach. Instead of iterating through the customers group each time for each month.

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = new CustomerStatistics();
        foreach (var customer in g)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    results.Jan += customer.Qty;
                    break;
                case 2:
                    results.Feb += customer.Qty;
                    break;
                case 3:
                    results.March += customer.Qty;
                    break;
                default:
                    break;
            }
        }
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

Or this one :

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

Complete solution:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            IEnumerable<CustData> myList = GetCustData().Take(100);

            var query = myList
                .GroupBy(c => c.CustId)
                .Select(g =>
                {
                    CustomerStatistics results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
                    return new
                    {
                        CustId = g.Key,
                        results.Jan,
                        results.Feb,
                        results.March
                    };
                });
            Console.ReadKey();
        }

        private static IEnumerable<CustData> GetCustData()
        {
            Random random = new Random();
            int custId = 0;
            while (true)
            {
                custId++;
                yield return new CustData { CustId = custId, OrderDate = new DateTime(2018, random.Next(1, 4), 1), Qty = random.Next(1, 50) };
            }
        }

    }
    public class CustData
    {
        public int CustId { get; set; }
        public DateTime OrderDate { get; set; }
        public int Qty { get; set; }
    }
    public class CustomerStatistics
    {
        public int Jan { get; set; }
        public int Feb { get; set; }
        public int March { get; set; }
        internal CustomerStatistics Accumulate(CustData customer)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    Jan += customer.Qty;
                    break;
                case 2:
                    Feb += customer.Qty;
                    break;
                case 3:
                    March += customer.Qty;
                    break;
                default:
                    break;
            }
            return this;
        }
        public CustomerStatistics Compute()
        {
            return this;
        }
    }
}
Ali Bayat
  • 3,561
  • 2
  • 42
  • 43
-4

Group your data on month, and then project it into a new datatable with columns for each month. The new table would be your pivot table.

ryanyuyu
  • 6,366
  • 10
  • 48
  • 53
mattlant
  • 15,384
  • 4
  • 34
  • 44