I need to add a new column to a dataframe based on an ID in the other one.
I created a small snippet of what I'm trying to do:
import pandas as pd
import numpy as np
a = pd.DataFrame([['ass-123-43', ['123', '456']],['ass-123-44', ['123', '457']]], columns=['customer_id', 'order_id'])
b = pd.DataFrame([['ass-123-43'], ['ass-123-44']], columns=['customer_id'])
dict_a = a.set_index('customer_id').order_id.to_dict()
b['order_id'] = np.nan
for customer_id, order_id in dict_a.items():
if customer_id in b.customer_id.values:
b.iloc[b.customer_id == customer_id, 1] = pd.Series([order_id])
print(b)
When I use the iloc method, the code works as expected:
customer_id order_id
0 ass-123-43 [123, 456]
1 ass-123-44 [123, 457]
But when I use loc method it doesn't work as expected:
import pandas as pd
import numpy as np
a = pd.DataFrame([['ass-123-43', ['123', '456']],['ass-123-44', ['123', '457']]], columns=['customer_id', 'order_id'])
b = pd.DataFrame([['ass-123-43'], ['ass-123-44']], columns=['customer_id'])
dict_a = a.set_index('customer_id').order_id.to_dict()
b['order_id'] = np.nan
for customer_id, order_id in dict_a.items():
if customer_id in b.customer_id.values:
b.loc[b.customer_id == customer_id, 'order_id'] = pd.Series([order_id])
print(b)
I got this result:
customer_id order_id
0 ass-123-43 [123, 456]
1 ass-123-44 NaN
Beyond loc using labels to assign the row and iloc a number, is there something else I missed?