Newbie trying to break my addiction to excel. I have a data set of paid invoices with the vendor and country where it was paid along with the amount. I want know for each vendor, which country they have the greatest invoice amount and what percentage of their total business is in that country. Using this data set I want the result to be:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
'Pct' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})
CoCntry = df.groupby(['Company', 'Country'])
CoCntry.aggregate(np.sum)
After looking at multiple examples including: Extract row with max value and Getting max value using groupby
2: Python : Getting the Row which has the max value in groups using groupby I've gotten as far as creating a DataFrameGroupBy summarizing the invoice data by country. I’m struggling with how to find the max row. After which I must figure out how to calculate the percent. Advice welcome.