I have this kind of pandas dataframe: for each customer_address there are few storeid associated (store_id and store_description) and the distance numeric column which measure the distance between each address and each store_id.
customer_address = ['random address -1234-caaap' , 'random address -1234-caaap' , 'random address -1234-caaap' ,
'random address -xxxxx-caaap','random address -xxxxx-caaap','random address -xxxxx-caaap']
store_id= ['1234' , '4567' , '7894' , '1234' , '4567' , '7894']
store_description = ['store #1' , 'store #2' , 'store #3' , 'store #1' , 'store #2' , 'store #3']
distance = [13 , 25 , 6 , 13 , 25 , 3]
df = pd.DataFrame()
df['customer_address'] = customer_address
df['store_id'] = store_id
df['store_description'] = store_description
df['distance'] = distance
Now i want to calculate is to calculate for each customer_address just the min of the distance, with the store_id and store_description associated with the distance.
I did something like this:
df.groupby('customer_address').min()
but i'm getting the right distance associated with the wrong store_id and store_description (the right store id is supposed to be '7894' for both customer_address)
Is there any way to calculate it right?