I am following the suggestions here pandas create new column based on values from other columns but still getting an error. Basically, my Pandas dataframe has many columns and I want to group the dataframe based on a new categorical column whose value depends on two existing columns (AMP, Time).
df
df['Time'] = pd.to_datetime(df['Time'])
#making sure Time column read from the csv file is time object
import datetime as dt
day_1 = dt.date.today()
day_2 = dt.date.today() - dt.timedelta(days = 1)
def f(row):
if (row['AMP'] > 100) & (row['Time'] > day_1):
val = 'new_positives'
elif (row['AMP'] > 100) & (day_2 <= row['Time'] <= day_1):
val = 'rec_positives'
elif (row['AMP'] > 100 & row['Time'] < day_2):
val = 'old_positives'
else:
val = 'old_negatives'
return val
df['GRP'] = df.apply(f, axis=1) #this gives the following error:
TypeError: ("Cannot compare type 'Timestamp' with type 'date'", 'occurred at index 0')
df[(df['AMP'] > 100) & (df['Time'] > day_1)] #this works fine
df[(df['AMP'] > 100) & (day_2 <= df['Time'] <= day_1)] #this works fine
df[(df['AMP'] > 100) & (df['Time'] < day_2)] #this works fine
#df = df.groupby('GRP')
I am able to select the proper sub-dataframes based on the conditions specified above, but when I apply the above function on each row, I get the error. What is the correct approach to group the dataframe based on the conditions listed?
EDIT:
Unforunately, I cannot provide a sample of my dataframe. However, here is simple dataframe that gives an error of the same type:
import numpy as np
import pandas as pd
mydf = pd.DataFrame({'a':np.arange(10),
'b':np.random.rand(10)})
def f1(row):
if row['a'] < 5 & row['b'] < 0.5:
value = 'less'
elif row['a'] < 5 & row['b'] > 0.5:
value = 'more'
else:
value = 'same'
return value
mydf['GRP'] = mydf.apply(f1, axis=1)
ypeError: ("unsupported operand type(s) for &: 'int' and 'float'", 'occurred at index 0')
EDIT 2: As suggested below, enclosing the comparison operator with parentheses did the trick for the cooked up example. This problem is solved.
However, I am still getting the same error in my my real example. By the way, if I were to use the column 'AMP' with perhaps another column in my table, then everything works and I am able to create df['GRP'] by applying the function f to each row. This shows the problem is related to using df['Time']. But then why am I able to select df[(df['AMP'] > 100) & (df['Time'] > day_1)]? Why would this work in this context, but not when the condition appears in a function?