1

Assume I have the following table: Original dataframe

I add a column called "status" which is the pairs (gender, senior_management), which is basically the pairs: [ (Female, True), (Male, True), (Male, False), ...] and so on. suppose I am looking for certain conditions, so I defined the list:

conditions = [(Female, True), (Male, False)]

my goal is now to use query to make a new data frame that has only values that have condition. I currently have (note that MyDataframe is the old one and I'm trying to save it as a new one while keeping the old one):

NewDataFrame = MyDataFrame.query('status in @conditions')
NewDataframe.head()

this only results in the column names of the data frame: Flawed_result What is happening here? and how do I fix it?

  • With a dummy dataframe, it works for me. How do you create the 'status' column? it could be as well, that maybe True and False were strings. – MrNobody33 Jul 21 '20 at 03:32
  • interesting! so this same code works for you? i create my new column by doing: `MyDataFrame['status'] = MyDataFrame.apply(lambda row: "({},{})".format(row['Gender'],row['Senior Management']), axis=1)` I checked and the column was created successfully. Any ideas why this is an issue? NOTEWORTHY: When I tried ti print the dataframe using: `print(NewDataFrame)` the output is: Empty DataFrame (Followed by the list of column names) – Beginner101 Jul 21 '20 at 05:10
  • That's the reason why it wasn't working. See my answer. @Beginner101 – MrNobody33 Jul 21 '20 at 05:39
  • Also, if you find it helpful, consider [accepting the answer](https://meta.stackexchange.com/a/5235)! @Beginner101 – MrNobody33 Jul 21 '20 at 05:39
  • Sorry, I don't understand the answer. Could you be more Specific? I don't see how the type would matter? – Beginner101 Jul 21 '20 at 05:41

1 Answers1

0

It seems like the status column it's of type string, because when you use format you cast the expression to a string, so it will not match never with the conditions list of tuples, so you can try to define the conditions list as strings:

import pandas as pd

df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]})
df['status']=df.apply(lambda row: "({},{})".format(row['gender'],row['Senior']), axis=1)
df
#   gender  Senior          status
#0    Male    True     (Male,True)
#1  Female    True   (Female,True)
#2    Male   False    (Male,False)
#3  Female   False  (Female,False)
conditions = ['(Female,True)', '(Male,False)']

df.query('status in @conditions')

Output:

   gender  Senior         status
1  Female    True  (Female,True)
2    Male   False   (Male,False)

If you want to have as a tuple instead of a string, you could try this to get the tuples, and then make the query

df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]}) 

df['status']=list(zip(df.gender, df.Senior))

conditions = [('Female',True), ('Male',False)]
df.query('status in @conditions')
MrNobody33
  • 6,413
  • 7
  • 19