0

I have this kind of pandas dataframe: for each customer_address there are few storeid associated (store_id and store_description) and the distance numeric column which measure the distance between each address and each store_id.

customer_address = ['random address -1234-caaap' , 'random address -1234-caaap' , 'random address -1234-caaap' ,  
                    'random address -xxxxx-caaap','random address -xxxxx-caaap','random address -xxxxx-caaap']
store_id= ['1234' , '4567' , '7894' , '1234' , '4567' , '7894']
store_description = ['store #1' , 'store #2' , 'store #3' , 'store #1' , 'store #2' , 'store #3']
distance = [13 , 25 , 6 , 13 , 25 , 3]

df = pd.DataFrame()
df['customer_address'] = customer_address
df['store_id'] = store_id
df['store_description'] = store_description
df['distance'] = distance

Now i want to calculate is to calculate for each customer_address just the min of the distance, with the store_id and store_description associated with the distance.

I did something like this:

df.groupby('customer_address').min()

but i'm getting the right distance associated with the wrong store_id and store_description (the right store id is supposed to be '7894' for both customer_address)

enter image description here

Is there any way to calculate it right?

Parsifal
  • 340
  • 6
  • 17

0 Answers0