0

I am trying to append a column called time to my pre-existing dataFrame df1. The values for my new column time are derived using the unique identifier column snapshot which exists in both df1 and df2 to link them.

I've tried to leverage both apply and lambda to help me create this column but it doesn't work.

df1 = pd.DataFrame({'product' : ['PENS', 'PENS', 'PENS', 'PENS', 'PENS','STAPLER','STAPLER'],
                'price' : [20,25,35,40,70,100,140],
                'snapshot' : [1,2,3,4,5,1,2]})

???df1
price  product  snapshot
0     20     PENS         1
1     25     PENS         2
2     35     PENS         3
3     40     PENS         4
4     70     PENS         5
5    100  STAPLER         1
6    140  STAPLER         2

df2 = pd.DataFrame({'snapshot' : [1,2,3,4,5],
               'publish_time' : ['10/10/2005', '2/19/2007', '6/20/2007', '7/10/2010', '7/15/2010']})

>>>df2
publish_time  snapshot
0   10/10/2005         1
1    2/19/2007         2
2    6/20/2007         3
3    7/10/2010         4
4    7/15/2010         5   

df1['time'] = df1['snapshot'].apply(lambda x : df2['publish_time'].loc[df2['snapshot'] == x])

ValueError: Wrong number of items passed 5, placement implies 1

Ideally what I want is this something like this:

>>>df1
    price  product         time  snapshot
 0     20     PENS   10/10/2005         1
 1     25     PENS    2/19/2007         2
 2     35     PENS    6/20/2007         3
 3     40     PENS    7/10/2010         4
 4     70     PENS    7/15/2010         5
 5    100  STAPLER   10/10/2005         1
 6    140  STAPLER    2/19/2007         2
nrs90
  • 168
  • 1
  • 3
  • 19

0 Answers0