0

I am trying to split the volume for a billing line by source. The billing line data volume is reported as one value, but I know that 55% of the volume originates from Source A, and 45% originates from Source B. How would I create new rows in my Pandas dataframe to split the row into two rows, one for each source?

I can calculate what the new volume value would be for each source and put it in new columns, but I'm not sure how to input those values into new rows.

Source A should be 55% of the Count, and Source B should be 45% of the Count.

from pandas import DataFrame
import numpy as np

before = DataFrame([{'Day': 1, 'Billing Line': 'abcdefg', 'Count': 1000},
                   {'Day': 2, 'Billing Line': 'abcdefg', 'Count': 2000}])


after = DataFrame([{'Day': 1, 'Billing Line': 'abcdefg', 'Count': 550, 'Source': 'a'},
               {'Day': 1,'Billing Line': 'abcdefg', 'Count': 450, 'Source':'b'},
                 {'Day': 2,'Billing Line': 'abcdefg', 'Count': 1100, 'Source':'a'},
                  {'Day': 2,'Billing Line': 'abcdefg', 'Count': 900, 'Source':'b'}])

1 Answers1

0

We using unnest

before['pct']=[[0.45,0.55]]*len(before)
before['Source']=[['a','b']]*len(before)

unnesting(before,['pct','Source']).eval('Count=Count*pct')
Out[395]: 
    pct Source Billing Line   Count  Day
0  0.45      a      abcdefg   450.0    1
0  0.55      b      abcdefg   550.0    1
1  0.45      a      abcdefg   900.0    2
1  0.55      b      abcdefg  1100.0    2
BENY
  • 317,841
  • 20
  • 164
  • 234