apply a function to a pandas Dataframe whose returned value is based on other rows

Question

I have a Dataframe looking like this:

>>> import pandas
>>> df = pandas.DataFrame({'region' : ['east', 'west', 'south', 'west',
...  'east', 'west', 'east', 'west'],
...  'item' : ['one', 'one', 'two', 'three',
...         'two', 'two', 'one', 'three'],
...         'quantity' : [3,3,4,5,12,14,3,8], "price" : [50,50,12,35,10,10,12,12]})
>>> df
    item  price  quantity region
0    one     50         3   east
1    one     50         3   west
2    two     12         4  south
3  three     35         5   west
4    two     10        12   east
5    two     10        14   west
6    one     12         3   east
7  three     12         8   west

and what I want to do is modify the values in the quantity column. Each new quantity value is caculated based on the number of different regions that exist for this row's combination of item, and price. More concretly I want to take each quantity and multiply it by the weight of it's region returned by a function I wrote that takes a region and the list of other region composing the pool:

region_weight(region, list_of_regions). For this imaginary situation, let's say:

region east is worth 1
region west is worth 2
south worth is worth 3

Then the returned weight of east in the pool east, west is 0.3333333333333333 (1/3). The weight of south in pool east, west, south is 0.5 (1/2).

So for the first row, we look at what other rows there are of item one and price 50. There are 2 one with east and one with the west region. The new quantity in the first row would be: 3 * region_weight("east", ["east", "west"]) or 3 * 0.3333333333333333.

I want to apply the same process to the whole quantity column. I don't know how to approach this problem with the pandas library other than looping through the Dataframe row by row.

score 4 · Accepted Answer · answered Jan 21 '13 at 01:40

Ok, I think this does what you want:

Make a dictionary of your regional weights:

In [1]: weights = {'east':1,'west':2,'south':3}

The following function maps values from a Series to the value found in the weights dictionary. x is the row value of region and w is the region series after it has been mapped to the weights dict.

In [2]: def f(x):
   ...:     w = x.map(weights)
   ...:     return w / w.sum().astype(float)

Here, we groupby ['item','price'] and apply the function above. The output is a series of relative weights for the unique combinations of item and price.

In [3]: df.groupby(['item','price']).region.apply(f)
Out[3]:
0    0.333333
1    0.666667
2    1.000000
3    1.000000
4    0.333333
5    0.666667
6    1.000000
7    1.000000

Finally, you can multiply df.quantity by the above series to calculate your weight-adjusted quantities.

In [4]: df['wt_quant'] = df.groupby(['item','price']).region.apply(f) * df.quantity

In [5]: df
Out[5]:
    item  price  quantity region  wt_quant
0    one     50         3   east  1.000000
1    one     50         3   west  2.000000
2    two     12         4  south  4.000000
3  three     35         5   west  5.000000
4    two     10        12   east  4.000000
5    two     10        14   west  9.333333
6    one     12         3   east  3.000000
7  three     12         8   west  8.000000

apply a function to a pandas Dataframe whose returned value is based on other rows

1 Answers1

Linked