I'm working on a presentation of two views of the same data, which is stored in a datatable with these fields (irrelevant fields not shown):
First view is the datatable itself:
Second view is a Pivot-like table where lines are defined by same set_id
and user_id
and columns are defined by sets of three with same date
but storing different values: rows.Count, rows.Sum(value) and rows.Sum(volumes). So I wrote this code:
Sub Test()
' Simulated Data
Dim dtDA = New DataTable
dtDA.Columns.Add("id", GetType(Integer))
dtDA.Columns.Add("set_id", GetType(Integer))
dtDA.Columns.Add("user_id", GetType(Integer))
dtDA.Columns.Add("date", GetType(Integer))
dtDA.Columns.Add("value", GetType(Integer))
dtDA.Columns.Add("volumes", GetType(Integer))
dtDA.Rows.Add(1001, 1, 11, 20160505, 1, 1)
dtDA.Rows.Add(1002, 1, 12, 20160505, 1, 2)
dtDA.Rows.Add(1003, 1, 13, 20160505, 1, 1)
dtDA.Rows.Add(1004, 1, 11, 20160505, 1, 1)
dtDA.Rows.Add(1005, 2, 14, 20160505, 1, 1)
dtDA.Rows.Add(1006, 2, 15, 20160505, 1, 2)
dtDA.Rows.Add(1007, 2, 16, 20160505, 1, 1)
dtDA.Rows.Add(1008, 2, 14, 20160505, 1, 1)
dtDA.Rows.Add(1009, 1, 12, 20160512, 1, 1)
dtDA.Rows.Add(1010, 1, 13, 20160512, 1, 2)
dtDA.Rows.Add(1011, 1, 11, 20160512, 1, 1)
dtDA.Rows.Add(1012, 1, 12, 20160512, 1, 1)
dtDA.Rows.Add(1013, 2, 15, 20160512, 1, 1)
dtDA.Rows.Add(1014, 2, 16, 20160512, 1, 2)
dtDA.Rows.Add(1015, 2, 14, 20160512, 1, 1)
dtDA.Rows.Add(1016, 2, 15, 20160512, 1, 1)
'Analysis
Dim DS = dtDA.Select.GroupBy(
Function(dr) New With {
.set_id = dr.Field(Of Integer?)("set_id").GetValueOrDefault,
.user_id = dr.Field(Of Integer?)("user_id").GetValueOrDefault,
.date = dr.Field(Of Integer?)("date").GetValueOrDefault}
).GroupBy(
Function(gSRD) New With {
.set_id = gSRD.Key.set_id,
.user_id = gSRD.Key.user_id})
Dim dtDS As New DataTable, drDS As DataRow
dtDS.Columns.Add("set_id", GetType(Integer))
dtDS.Columns.Add("user_id", GetType(String))
dtDS.Columns.Add("date", GetType(Integer))
If DS.Any Then
For Each d In DS.SelectMany(Function(g) g.Select(Function(gg) gg.Key.date)).Distinct.OrderBy(Function(i) i)
dtDS.Columns.Add("c" & d, GetType(Integer))
dtDS.Columns.Add("p" & d, GetType(Integer))
dtDS.Columns.Add("v" & d, GetType(Integer))
Next
For Each gSR In DS
drDS = dtDS.NewRow
drDS.SetField("set_id", gSR.Key.set_id)
drDS.SetField("user_id", gSR.Key.user_id)
For Each gSRD In gSR
drDS.SetField("c" & gSRD.Key.date, gSRD.Count)
drDS.SetField("p" & gSRD.Key.date, gSRD.Sum(Function(dr) dr.Field(Of Integer?)("value").GetValueOrDefault))
drDS.SetField("v" & gSRD.Key.date, gSRD.Sum(Function(dr) dr.Field(Of Integer?)("volumes").GetValueOrDefault))
Next
dtDS.Rows.Add(drDS)
Next
End If
End Sub
I was expecting to get this:
set_id|user_id|c20160505|p20160505|v20160505|c20160512|p20160512|v20160512
------+-------+---------+---------+---------+---------+---------+---------
1| 11| 2| 2| 2| 1| 1| 1
1| 12| 1| 1| 2| 2| 2| 2
1| 13| 1| 1| 1| 1| 1| 2
2| 14| 2| 2| 2| 1| 1| 1
2| 15| 1| 1| 2| 2| 2| 2
2| 16| 1| 1| 1| 1| 1| 2
Instead, I'm getting this:
Can anyone help me and tell what I'm doing wrong here? Thank you very much!
EDIT
The problem seems to be I based my code on the premise that Linq grouping by key as Anonymous Type would behave as described in https://stackoverflow.com/a/12124777/3718031 but unfortunately it seems it is not quite so. I noticed that if I replace anonymous types by concatenating strings, my code does what I first expected:
Dim DS = dtDA.Select.GroupBy(
Function(dr) dr.Field(Of Integer?)("set_id").GetValueOrDefault & "," & dr.Field(Of Integer?)("user_id").GetValueOrDefault & "," & dr.Field(Of Integer?)("date").GetValueOrDefault
).GroupBy(
Function(gSRD) gSRD.First.Field(Of Integer?)("set_id").GetValueOrDefault & "," & gSRD.First.Field(Of Integer?)("user_id").GetValueOrDefault)
Dim dtDS As New DataTable, drDS As DataRow
dtDS.Columns.Add("set_id", GetType(Integer))
dtDS.Columns.Add("user_id", GetType(String))
If DS.Any Then
For Each d In DS.SelectMany(Function(g) g.Select(Function(gg) gg.First.Field(Of Integer?)("date").GetValueOrDefault)).Distinct.OrderBy(Function(i) i)
dtDS.Columns.Add("c" & d, GetType(Integer))
dtDS.Columns.Add("p" & d, GetType(Integer))
dtDS.Columns.Add("v" & d, GetType(Integer))
Next
For Each gSR In DS
drDS = dtDS.NewRow
drDS.SetField("set_id", gSR.First.First.Field(Of Integer?)("set_id").GetValueOrDefault)
drDS.SetField("user_id", gSR.First.First.Field(Of Integer?)("user_id").GetValueOrDefault)
For Each gSRD In gSR
drDS.SetField("c" & gSRD.First.Field(Of Integer?)("date").GetValueOrDefault, gSRD.Count)
drDS.SetField("p" & gSRD.First.Field(Of Integer?)("date").GetValueOrDefault, gSRD.Sum(Function(dr) dr.Field(Of Integer?)("value").GetValueOrDefault))
drDS.SetField("v" & gSRD.First.Field(Of Integer?)("date").GetValueOrDefault, gSRD.Sum(Function(dr) dr.Field(Of Integer?)("volumes").GetValueOrDefault))
Next
dtDS.Rows.Add(drDS)
Next
End If
So, the remaining question is: why are not anonymous types working as grouping keys based on equality of each one of their properties?