Efficiently adding calculated rows based on index values to a pandas DataFrame

Question

I have a pandas DataFrame in the following format:

     a   b   c
0    0   1   2
1    3   4   5
2    6   7   8
3    9  10  11
4   12  13  14
5   15  16  17

I want to append a calculated row that performs some math based on a given items index value, e.g. adding a row that sums the values of all items with an index value < 2, with the new row having an index label of 'Red'. Ultimately, I am trying to add three rows that group the index values into categories:

A row with the sum of item values where index value are < 2, labeled as 'Red'
A row with the sum of item values where index values are 1 < x < 4, labeled as 'Blue'
A row with the sum of item values where index values are > 3, labeled as 'Green'

Ideal output would look like this:

       a   b   c
0      0   1   2
1      3   4   5
2      6   7   8
3      9  10  11
4     12  13  14
5     15  16  17
Red    3   5   7
Blue  15  17  19
Green 27  29  31

My current solution involves transposing the DataFrame, applying a map function for each calculated column and then re-transposing, but I would imagine pandas has a more efficient way of doing this, likely using .append().

EDIT: My in-elegant pre-set list solution (originally used .transpose() but I improved it using .groupby() and .append()):

df = pd.DataFrame(np.arange(18).reshape((6,3)),columns=['a', 'b', 'c'])
df['x'] = ['Red', 'Red', 'Blue', 'Blue', 'Green', 'Green']
df2 = df.groupby('x').sum()
df = df.append(df2)
del df['x']

I much prefer the flexibility of BrenBarn's answer (see below).

For your second case, do you mean 2 < x < 4? – BrenBarn May 28 '13 at 18:29 — BrenBarn, May 28 '13 at 18:29

BrenBarn · Accepted Answer · 2013-05-28T19:23:12.653

Here is one way:

def group(ix):
    if ix < 2:
        return "Red"
    elif 2 <= ix < 4:
        return "Blue"
    else:
        return "Green"

>>> print d
    a   b   c
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5  15  16  17
>>> print d.append(d.groupby(d.index.to_series().map(group)).sum())
        a   b   c
0       0   1   2
1       3   4   5
2       6   7   8
3       9  10  11
4      12  13  14
5      15  16  17
Blue   15  17  19
Green  27  29  31
Red     3   5   7

For the general case, you need to define a function (or dict) to handle the mapping to different groups. Then you can just use groupby and its usual abilities.

For your particular case, it can be done more simply by directly slicing on the index value as Dan Allan showed, but that will break down if you have a more complex case where the groups you want are not simply definable in terms of contiguous blocks of rows. The method above will also easily extend to situations where the groups you want to create are not based on the index but on some other column (i.e., group together all rows whose value in column X is within range 0-10, or whatever).

score 2 · Answer 2 · answered May 28 '13 at 18:32

The role of "transpose," which you say you used in your unshown solution, might be played more naturally by the orient keyword argument, which is available when you construct a DataFrame from a dictionary.

In [23]: df
Out[23]: 
    a   b   c
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5  15  16  17

In [24]: dict = {'Red': df.loc[:1].sum(), 
                 'Blue': df.loc[2:3].sum(), 
                 'Green': df.loc[4:].sum()}

In [25]: DataFrame.from_dict(dict, orient='index')
Out[25]: 
        a   b   c
Blue   15  17  19
Green  27  29  31
Red     3   5   7

In [26]: df.append(_)
Out[26]: 
        a   b   c
0       0   1   2
1       3   4   5
2       6   7   8
3       9  10  11
4      12  13  14
5      15  16  17
Blue   15  17  19
Green  27  29  31
Red     3   5   7

Based the numbers in your example, I assume that by "> 4" you actually meant ">= 4".

Efficiently adding calculated rows based on index values to a pandas DataFrame

2 Answers2

Linked