How to filter out rows in a DataFrame that differ by a sign?

Question

I have a DataFrame whose rows are arrays of numbers of some given length. I want to effectively get rid of those rows that differ by the sign of -1 from other rows. For example, if I encounter

 1 -2  2  4 -4

-1  2 -2 -4  4

then I want to drop one of them. Is there a not overly expensive way of doing that in Python/Pandas?

An example of such a DataFrame to be found here.

Can you provide an example database? See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — SchwarzeHuhn, Dec 30 '19 at 21:36

score 1 · Answer 1 · answered Dec 31 '19 at 08:13

So there's one trick you can do,

Only if you have an odd number of columns

This way, you can invert the sign of any entry that has more than ceil(n_cols/2.0) negative values and then perform drop_duplicates.

from math import ceil
df = pd.read_csv('example_csv.csv')
cols = df.columns
df['n_minus'] = (df<0).sum(axis=1)
df.loc[df['n_minus']>ceil(len(cols)/2.0),cols] = df.loc[df['n_minus']>ceil(len(cols)/2.0),cols]*-1.0
new_df = df[cols].drop_duplicates()

How to filter out rows in a DataFrame that differ by a sign?

1 Answers1