0

I'm trying to remove Dataframe rows using multiple conditions from one Dataframe (df1) based on values from a second Dataframe (df2). The data I'm interested in comparing within these dataframes is labelled 'Timestamp' (T) and 'delta_t' (dt).

The function I'm looking to apply is that when T_{df1} == T_{df2}, then remove all lines where dt_{df2} - 0.1 < dt_{df1} < dt_{df2}

In other words, when the timestamp values from each dataframe are equal, I then want to compare the delta_t values. If the delta_t values of df_1 fall within a +/- range of 0.1 of the delta_t values of df2, then remove these rows from the df1.

Any help is much appreciated!

Cheers!

I have tried using df1.loc['timestamp'].isin(df2['timestamp'] to acquire the rows with corresponding timestamp values. BUt I'm not sure how to compare the delta_t values and remove lines which fall within a specific range.

EDIT: The data is originally saved in one dataframe with many columns. One of the columns is labelled 'channels'. To form the two dataframes (df1, df2) that I compare, I separate based on the channel value using the following:

noise = df1[df1['channel'] == 3]['timestamp_copy'] df2 = df1.loc[(df1['timestamp_copy'].isin(noise))]

Therefore, the number of rows in df1 >> df2.

JR_11
  • 1
  • 1
  • 2
    Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions – itprorh66 Jul 13 '23 at 17:52

1 Answers1

1

If I got you correctly then this satisfies your task, you can select the indices where your desired condition satisfied and then drop them from the dataframe df1 as

import pandas as pd

df1 = pd.DataFrame([[1,2],[10,11]],columns=['a','b'])
df2 = pd.DataFrame([[1,2],[11,10]],columns=['a','b'])

indices_to_removed = df1[ ( ( df1['a'] == df2['a'] ) & ( abs( df1['b'] - df2['b'] ) <= 0.1 ) ) ].index
df1 = df1.drop(indices_to_removed)
print(df1)

just replace a, and b with your columns names.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
David George
  • 318
  • 1
  • 5
  • Hi, thank you for the kind words and the help, I really appreciate it! I have some further questions after trying to implement your suggestions. When I run with these lines I get the value error of "Can only compare identically-labeled series objects". The data frames I am using are not of the same length, would this be the reason for this error? Thanks again for your time and help!! – JR_11 Jul 14 '23 at 12:40
  • Hi , Thanks! . Yes this problem can happen if your dataframes are of different sizes , so how would you like to handle rows df1 that doesn't have mirror rows in df2 to check ? for example just leave them without any changes ? – David George Jul 14 '23 at 15:38