I have a Dataframe looking like this:
>>> import pandas
>>> df = pandas.DataFrame({'region' : ['east', 'west', 'south', 'west',
... 'east', 'west', 'east', 'west'],
... 'item' : ['one', 'one', 'two', 'three',
... 'two', 'two', 'one', 'three'],
... 'quantity' : [3,3,4,5,12,14,3,8], "price" : [50,50,12,35,10,10,12,12]})
>>> df
item price quantity region
0 one 50 3 east
1 one 50 3 west
2 two 12 4 south
3 three 35 5 west
4 two 10 12 east
5 two 10 14 west
6 one 12 3 east
7 three 12 8 west
and what I want to do is modify the values in the quantity column. Each new quantity value is caculated based on the number of different regions that exist for this row's combination of item, and price. More concretly I want to take each quantity and multiply it by the weight of it's region returned by a function I wrote that takes a region and the list of other region composing the pool:
region_weight(region, list_of_regions)
. For this imaginary situation, let's say:
- region east is worth 1
- region west is worth 2
- south worth is worth 3
Then the returned weight of east in the pool east, west is 0.3333333333333333 (1/3). The weight of south in pool east, west, south is 0.5 (1/2).
So for the first row, we look at what other rows there are of item one and price 50. There are 2 one with east and one with the west region. The new quantity in the first row would be: 3 * region_weight("east", ["east", "west"])
or 3 * 0.3333333333333333.
I want to apply the same process to the whole quantity column. I don't know how to approach this problem with the pandas library other than looping through the Dataframe row by row.