I am currently working on a code that reads and combines two data frames and outputs them into a separate CSV. However, when getting the result, it seems like duplicates occur. Both data frames have data recorded in 10 hz. However one is recorded in 1 seconds intervals (There will for example be 10 cells with the time-column value 14:51:21). The other one has milliseconds (14:51:21.352, 14:51:21.452, ...). When I merge the data frames I get an output that is 10x longer than the original data. My guess is that the data with 1 second intervals merges to 10 different cells.
Here's an example of what my data frames look like:
Data frame 1:
TIME FUEL OTHER OTHER2
...
2020-12-03 22:06:16.944 124 49 157
2020-12-03 22:06:17.044 113 42 132
2020-12-03 22:06:17.144 144 11 111
2020-12-03 22:06:17.244 224 29 557
2020-12-03 22:06:17.344 1224 422 151
2020-12-03 22:06:17.444 324 56 257
2020-12-03 22:06:17.544 18 29 157
2020-12-03 22:06:17.644 144 99 182
2020-12-03 22:06:17.744 672 14 977
2020-12-03 22:06:17.844 655 13 87
2020-12-03 22:06:17.944 194 15 14
...
Data frame 2:
TIME SPEED OTHER OTHER2
...
2020-12-03 22:06:16 50 100 52
2020-12-03 22:06:17 60 130 22
2020-12-03 22:06:17 100 105 13
2020-12-03 22:06:17 24 99 104
2020-12-03 22:06:17 44 24 22
2020-12-03 22:06:17 66 20 257
2020-12-03 22:06:17 110 95 102
2020-12-03 22:06:17 90 22 24
2020-12-03 22:06:17 10 64 51
2020-12-03 22:06:17 11 66 61
2020-12-03 22:06:17 100 100 103
...
I am using the merge_asof method.
CODE THAT IS MERGING MY DATA FRAMES
merged_df = pd.merge_asof(ewholedf, wholedf.sort_values('TIME').assign(Mätarställning2=wholedf['TIME']), on='TIME', direction='nearest')
merged_df['TIME_Diff'] = merged_df['TIME'].sub(merged_df['TIME2']).abs()
merged_df.sort_values(by=['TIME2', 'TIME_Diff']).drop_duplicates(subset=['TIME2']).sort_index().drop(['TIME2', 'TIME_Diff'], axis=1)
OUTPUT
I cant attach the output as text here because of lenght and clarity so I will post an screenshot of the output in order to give you an better understanding of the situtation.
The data in the data frame examples are sample data which is why the data in the image looks different (it should not affect the output).
(separator: ",".)
In the image above you can see the red arrows marking the frequency of occurring duplicates and also the yellow marker showing that the time column is not merging the 10 hz data and keeping it 10 hz (there are more than 10 rows with the same second).
If my theory is right, please update me with a solution. If not, update me with the problem and the solution.
Hopefully this was enough information. Thank you in advance!