0

My problem is that i do not know how to compare the numbers in two different columns (in the same dataframe). I would like to know if a number in second column is at least two times bigger than the number of the first column in the same row and check if it is the same for the rest of the rows and eventually filter them and, in the end, have a dataframe in which all the numbers in second column are at least two times bigger than the numbers in the first column. So, at first i did this:

ac = pd.DataFrame.dropna(ab)
ad = pd.DataFrame.drop_duplicates(ac)

There were so many NAN that i decided to get rid of them

ad["first column"] = ad["first column"].astype(float)
ad["second column"] = ad["second column"].astype(float)

Even without theses line, i still get the same error in the following

Then i tried to take the next step:

boolean = []

def comp(number):
    if ad.loc[:, "first column"] >= ad.loc[:, "second column"]*2:

        boolean.append[True]

    else:

         boolean.append[False]

At first i wrote it as a for loop but then i changed it to this function. So, i could use apply() method but either way i get this error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index Probe Set ID')
Andrei
  • 55,890
  • 9
  • 87
  • 108
Anderson
  • 3
  • 4
  • Possible duplicate of [Compare two columns using pandas](https://stackoverflow.com/questions/27474921/compare-two-columns-using-pandas) – Andrei Feb 19 '19 at 23:37
  • @Anderson Doesn't this work: `df[df['first_col'] >= (df['second_col']*2)]` ? – panktijk Feb 19 '19 at 23:48

2 Answers2

0

You can create a new series dataframe for each column and apply a comparison using that.

df = pd.DataFrame(... all your data with columns...)
df = df.astype(float) #convert your whole df to a float

firstcol = df['firstcol']
secondcol = df['secondcol']*2

#a new series of True/False
booleanmatch = firstcol>secondcol

#remove rows that are false from df
df= df.loc[booleanmatch,:]

Hope this solves the problem.

user2589273
  • 2,379
  • 20
  • 29
0

To compare two columns in a dataframe, you should use .query https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

import pandas as pd
d = {'col1': [1, 2, 6], 'col2': [3, 4, 5]}
df = pd.DataFrame(data=d)
df.query('col1 > col2')
Manu
  • 178
  • 6