0

For post-processing of some measurements i have three pandas dataframes that look like this:

df1:
   direction  sequence  timestamp  remote_timestamp     delta
0          U         1     461945                -1        -1
1          U         2     462106                -1        -1
2          U         3     462269                -1        -1
..       ...       ...        ...               ...       ...
97         U        98     477601                -1        -1
98         U        99     477762                -1        -1
99         U       100     477924                -1        -1

df2:
  direction  sequence  timestamp  remote_timestamp  delta
0          U       101     500663                -1     -1
1          U       102     500829                -1     -1
2          U       103     501000                -1     -1
..       ...       ...        ...               ...    ...
98         U       199     516631                -1     -1
99         U       200     516792                -1     -1




df3
   direction  sequence  timestamp  remote_timestamp  delta
0           U         1         -1            462791     -1
1           U         2         -1            462791     -1
2           U         3         -1            462894     -1
..        ...       ...          ...        ...               ...    ...
197         U       198         -1            525099     -1
198         U       199         -1            525100     -1
199         U       200         -1            525100     -1

So i have two dataframes that have 100 lines and one that has 200 lines. Now i want to write the column "remote_timestamp" of the third dataframe into the first and second at the positions of their sequence number.

For either one of the frames I tried somethng like this:

df1['remote_timestamp'] = numpy.where(df1['sequence'].values == df3['sequence'].values,df3['remote_timestamp'],-1)

but since the lengths of the dataframes do not match i get:

ValueError: Length of values does not match length of index

What would be a way to solve the problem. Maybe numpy.where is not the ideal solution.

Thank you

Novice
  • 855
  • 8
  • 17
haddock
  • 33
  • 5

2 Answers2

0

If you change the 'sequence' column to the index for the third dataframe, you can then just access the data you want directly. The code below should work, but I can't reproduce your example based on what you gave us

df3.set_index('sequence')
new_values=df3['remote_timestamp'].loc[df1['sequence']]
df1['remote_timestamp']=new_values

I made a reproducible example myself and it appears to work fine

df=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[1,2,3,4,5,6,7,8,9,10],'b':[0,9,8,7,6,5,4,3,2,11]})

df.set_index('a')

df2['b'].loc[df['a']]

which outputs

1    9
2    8
3    7
4    6
5    5
Name: b, dtype: int64

Hope this helps! And look into making smaller reproducible examples. not only does it help us solve your problems, I have often solved my own questions when I simplify things and make small test cases-

Novice
  • 855
  • 8
  • 17
0

If sequence in df3 doesn't have duplicates use Series.map

df1['remote_timestamp']=df1['sequence'].map(df3.set_index('sequence')['remote_timestamp'])
df2['remote_timestamp']=df2['sequence'].map(df3.set_index('sequence')['remote_timestamp'])
ansev
  • 30,322
  • 5
  • 17
  • 31