I am trying to append a column called time
to my pre-existing dataFrame df1
. The values for my new column time
are derived using the unique identifier column snapshot
which exists in both df1
and df2
to link them.
I've tried to leverage both apply
and lambda
to help me create this column but it doesn't work.
df1 = pd.DataFrame({'product' : ['PENS', 'PENS', 'PENS', 'PENS', 'PENS','STAPLER','STAPLER'],
'price' : [20,25,35,40,70,100,140],
'snapshot' : [1,2,3,4,5,1,2]})
???df1
price product snapshot
0 20 PENS 1
1 25 PENS 2
2 35 PENS 3
3 40 PENS 4
4 70 PENS 5
5 100 STAPLER 1
6 140 STAPLER 2
df2 = pd.DataFrame({'snapshot' : [1,2,3,4,5],
'publish_time' : ['10/10/2005', '2/19/2007', '6/20/2007', '7/10/2010', '7/15/2010']})
>>>df2
publish_time snapshot
0 10/10/2005 1
1 2/19/2007 2
2 6/20/2007 3
3 7/10/2010 4
4 7/15/2010 5
df1['time'] = df1['snapshot'].apply(lambda x : df2['publish_time'].loc[df2['snapshot'] == x])
ValueError: Wrong number of items passed 5, placement implies 1
Ideally what I want is this something like this:
>>>df1
price product time snapshot
0 20 PENS 10/10/2005 1
1 25 PENS 2/19/2007 2
2 35 PENS 6/20/2007 3
3 40 PENS 7/10/2010 4
4 70 PENS 7/15/2010 5
5 100 STAPLER 10/10/2005 1
6 140 STAPLER 2/19/2007 2