A lot of questions been asked about similar topic but none seem to work for me. The closest one are In a dataframe, I want to compare column A and column B and extract value in which A >= B?. I need to find matching values in one column and if they are the same. In the case of this example, I want to get rid of all the suppliers that do not have the biggest number of orders placed. At this point nothing that i have tried seem to do the trick.
Here is a toy dataset to illustrate the problem:
data = pd.DataFrame({'Id' : ['001', '001', '001', '001', '002', '002', '002', '003', '003', '003'],
'Supplier' : ['ABC', 'ABC', 'ABZ', 'ANA', 'ABA', 'ABC', 'ASS', 'ASS', 'ASS', 'ASS'],
'BuyPrice' : [7,7,7,8,7,5,7,7,7,7],
'Quantity' : [200,20,50,40,210,47,66,53,100,60]})
and what I am trying to achieve is:
dd = pd.DataFrame({'Id': ['001', '002', '003'],
'main_supplier' : ['ABC', 'ABA', 'ASS'],
'BuyPrice' : [7,7,7]})
where data_result['BuyPrice'] is the price associated with the main supplier
I have been able to do this:
dd = pd.DataFrame()
dd['Id'] = data1['Id']
dd['number_of_orders'] = pd.DataFrame(data1['Quantity']).applymap(lambda x: len(x))
dd['BuyPrice'] = pd.DataFrame(data1['BuyPrice']).applymap(lambda x: np.mean(x))
dd['Supplier'] = data1['Supplier']
for x in dd['Id']:
if dd['Id'] == dd['Id']:
dd['Principal_supplier'] = np.where(np.where((dd['number_of_orders'][x] > dd['number_of_order'][x-1]), 'Yes'))
else:
dd['Principal_supplier'] = 'No'
drop_Secondary_supplier = dd[dd['Principal_supplier' != 'Yes']].drop()
and i get the following error:
self.__class__.__name__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I am struggling a lot of with the logic of comparing the "number of orders" if and only if the "id" are the same.
Any help would be huge!