With PowerPivot in Exycel I am able to
a) create a data model that consists of several connected tables (see example below)
b) create a pivot table based on that model
For display, the pivot table does not use id (=integer) values but the corresponding string values as row/column headers.
With pandas I could
a) load and join those related tables b) create a pivot table based on the joined table
pivot_table = pandas.pivot_table(
joined_table,
index=["scenario_name"], #entries to show as row headers
columns='param_name', #entries to show as column headers
values='value', #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
However, with huge tables, I would expect it to be more efficient, if the pivot_table operates on the non-joined data table and applies the string values only for result display.
=> Is there a convenient way to consider foreign key relations when using
pandas DataFrame
and pivot_table
?
I would expect something like
pivot_table = pandas.pivot_table(
{"data": data_table,
"scenario": scenario_table,
"param": param_table
},
index=["scenario:name"], #entries to show as row headers
columns="param:name", #entries to show as column headers
values="data:value", #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
=> If not, are there some alternatives libraries to pandas that could handle related tables as source for pivot tables?
Small example table structure:
table "data"
id scenario_id param_id value
1 1 1 100
2 1 2 200
table "scenario"
id name
1 reference
2 best_case
table "param"
id name
1 solar
2 wind
scenario_id
of table data
points on id
of table scenario
param_id
of table data
points on id
of table param
Another example with some more columns: