4

I have a dataframe A that looks like this

bucket  value   
1       0.001855    
1       0.000120    
2       0.000042    
2       0.001888    

and a dataframe B that looks like this

bucket  num 
1       .5  
2       .3

I want to create a column in A that has all value divided by num in B matched by bucket. How do I do this?

SuperString
  • 21,593
  • 37
  • 91
  • 122

3 Answers3

3

UPDATE: answers the following question from the comment:

What if A is a multiindex? With ['bucket1','bucket2'] as index but we only care for bucket1?

In [140]: A
Out[140]:
                    value
bucket1 bucket2
1       10       0.001855
        11       0.000120
2       12       0.000042
        13       0.001888

In [141]: B
Out[141]:
   bucket  num
0       1  0.5
1       2  0.3

In [142]: A['new'] = A.value / A.reset_index().iloc[:, 0].map(B.set_index('bucket').num).values

In [143]: A
Out[143]:
                    value       new
bucket1 bucket2
1       10       0.001855  0.003710
        11       0.000120  0.000240
2       12       0.000042  0.000140
        13       0.001888  0.006293

OLD answer:

you can use Series.map() method:

In [61]: A['new'] = A.value.div(A.bucket.map(B.set_index('bucket').num))

In [62]: A
Out[62]:
   bucket     value       new
0       1  0.001855  0.003710
1       1  0.000120  0.000240
2       2  0.000042  0.000140
3       2  0.001888  0.006293

or as a virtual column:

In [60]: A.assign(new=A.value/A.bucket.map(B.set_index('bucket').num))
Out[60]:
   bucket     value       new
0       1  0.001855  0.003710
1       1  0.000120  0.000240
2       2  0.000042  0.000140
3       2  0.001888  0.006293

Explanation:

In [65]: B.set_index('bucket')
Out[65]:
        num
bucket
1       0.5
2       0.3

In [66]: A.bucket.map(B.set_index('bucket').num)
Out[66]:
0    0.5
1    0.5
2    0.3
3    0.3
Name: bucket, dtype: float64
Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • What if A is a multiindex? With ['bucket1','bucket2'] as index but we only care for bucket1? I am getting No axis named 2 for object type as an error – SuperString Dec 22 '16 at 20:34
  • 1
    @SuperString, can you post __reproducible__ sample data sets and expected data set? Pleas read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Dec 22 '16 at 20:47
  • 1
    @SuperString in all fairness, that is the sort of thing you bring up in the question. – piRSquared Dec 22 '16 at 21:14
1

I'm really just admiring @MaxU's answer and wanted to contribute something.
Here is a numpy answer

A.value /= B.num.values.dot(B.bucket.values[:, None] == A.bucket.values)

A

   bucket     value
0       1  0.003710
1       1  0.000240
2       2  0.000140
3       2  0.006293
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

Probably not as efficient, but you could also use merge to properly distribute values of num in dfB across dfA and then use element-wise division at the index level. to calculate these values.

dfA['new'] = dfA['value'] / pd.merge(dfA, dfB, on='bucket')['num']

dfA
   bucket     value       new
0       1  0.001855  0.003710
1       1  0.000120  0.000240
2       2  0.000042  0.000140
3       2  0.001888  0.006293
lmo
  • 37,904
  • 9
  • 56
  • 69