I have a dataframe with mixed types
import pandas as pd
df = pd.DataFrame({'gender':list('MMMFFF'),
'height':[4,5,4,5,5,4],
'age':[70,80,90,40,2,3]})
print(df)
I need to be able to take pairs of columns and filter the dataframe on those pairs of columns. So, I have seen loc and query and it looks like I need to query but programmatically.
This link got me some of the way there, but to really be able to do this, I need to be able to programatically know and change the types in the query. Look at how they do it with known types:
column = ['height', 'age', 'gender']
equal = ['>', '>', '==']
condition = [1.68, 20, 'F']
query = ' & '.join(f'{i} {j} {repr(k)}' for i, j, k in zip(column, equal, condition))
df.query(query)
I don't have the luxury to know what the types will be when my pairs of columns come in. Is there a way to make this more flexible and introduce testing for types and then implementing the right query for the right types? By this, I mean that I need to add quotes but only if my values are strings and I need to do this by testing the type.
Edit: It looks like I solved this the following way:
yQuery = ' & '.join(['{}=={}'.format(self.query_type_setter(k,True),self.query_type_setter(v, False)) for k, v in yVal])
def query_type_setter(self, value, isColumn):
"""
sets the query value depending on whether or not it is a string
:param value:
:return:
"""
if isColumn:
return "`" + value + "`"
if isinstance(value, str):
return "'" + value + "'"
else:
return value
I added in a function in the right spot to detect for strings.