0

I have a DataFrame whose rows are arrays of numbers of some given length. I want to effectively get rid of those rows that differ by the sign of -1 from other rows. For example, if I encounter

 1 -2  2  4 -4

-1  2 -2 -4  4

then I want to drop one of them. Is there a not overly expensive way of doing that in Python/Pandas?

An example of such a DataFrame to be found here.

Tomasz Kania
  • 231
  • 2
  • 10

1 Answers1

1

So there's one trick you can do,

Only if you have an odd number of columns

This way, you can invert the sign of any entry that has more than ceil(n_cols/2.0) negative values and then perform drop_duplicates.

from math import ceil
df = pd.read_csv('example_csv.csv')
cols = df.columns
df['n_minus'] = (df<0).sum(axis=1)
df.loc[df['n_minus']>ceil(len(cols)/2.0),cols] = df.loc[df['n_minus']>ceil(len(cols)/2.0),cols]*-1.0
new_df = df[cols].drop_duplicates()
thushv89
  • 10,865
  • 1
  • 26
  • 39