Dual keys statistics.
In order to deal with data from excels, I tried to use Pandas DataFrame to do statistics. The datasheet structure is something like,
Section | Team | Bugs | Functions | month |
---|---|---|---|---|
A | dev | -1 --- | -1------ | -Jan---- |
B | dev | -2------ | -2----- | -Jan---- |
B | design | -3------ | -0------ | -Feb---- |
A | design | -4------ | -3------ | -Jan---- |
A | design | -4------ | -3------ | -Sep---- |
A | dev | -4------ | -3------ | -Jul---- |
B | dev | -2------ | -2----- | -Nov---- |
I need to count the total number of Bugs and Functions for every (section, team) pairs. e.g. based on the data above, for (A, dev), total number of bugs (TNB :)) = 5 (1+4) (A, design),
total number of functions (TNF) = 8 (4+4)
I have tried with the following,
for key, value in df.value_counts().items(): ... however, it just does statistics with single key, which is not the expected result.
or divide original table into serveral sub-tables by 'sections, it may increase the steps.
Any suggestions (algo or API available) to do statics with lower time/space complexity both?