0

Basically I have a list of questions from a survey '''list_of_columns''' that also serve as columns in my dataframe. In the beginning of my code I replaced all blank responses as 'Empty', but some survey respondents did not answer questions they were supposed to (as indicated by whether or not a '1' is marked in the 'EOPS/CARE' or 'CalWORKs' columns of my dataframe, but have 'Empty' in a question pertaining to those respective programs), so I want to recode 'Empty's as 'Missing's in these cases to accurately reflect that.

Here is the code I have to try and remedy that:

list_of_columns = ['E1', 'E2', 'E3', 'E5', 'E11', 'E13', 'E14', 'E17', 'E18', 'E20', 'C2', 'C7', 'C8', 'C9', 'C11', 'C12', 'NU2', 'NU7', 'NU8', 'NU10', 'NU11', 'CAL1', 'CAL2', 'CAL3', 'CAL5', 'CAL10', 'CAL12', 'CAL14', 'CAL15', 'O1'] # list of survey questions that are also columns in my df. Questions with 'E' indicate they are related to EOPS/CARE, questions with 'CAL' indicated they are related to CalWORKs, etc. 

for question in list_of_columns:

    if 'E' in question and data_final['EOPS/CARE'] == 1: # if 'E' is in the question, and the column 'EOPS/CARE' in my df is equal to 1, replace all instances of "Empty" with "Missing"

        data_final[question] = np.where(data_final[question] == "Empty", "Missing", data_final[question])

    elif 'CAL' in question and data_final['CalWORKs'] == 1: # similarly, if  'CAL' is in the question, and the column 'EOPS/CARE' in my df is equal to 1, replace all instances of "Empty" with "Missing"

        data_final[question] = np.where(data_final[question] == "Empty", "Missing", data_final[question])

    else:

        pass

I keep getting this when I try to execute: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

This would work fairly easily in Stata, but I'm determined to do this in Python as the rest of my code is already in Python. I'm still learning the language so it could be due to syntax. Thanks so much!

  • What is the issue, exactly? Have you read the Pandas docs? – AMC Jan 04 '20 at 22:01
  • hi @AMC, the code isn't working for some reason. The error I get when I try to execute is "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." – pythonnewbiebb Jan 04 '20 at 22:04
  • 1
    Is that the entire error message? Have you tried doing some research on that error? I know I've personally seen the same issue on here a few times recently. In any case there's not enough here to reproduce the error, so all I can do is direct you to the docs and to those other questions. – AMC Jan 04 '20 at 22:07
  • In my opinion this should be closed as a duplicate of https://stackoverflow.com/q/36921951/11301900. – AMC Jan 04 '20 at 22:08
  • thanks for the links. I tried these out, but even when I use bit-wise '&' to replace my 'and', I get a new error: "TypeError: cannot compare a dtyped [int64] array with a scalar of type [bool]", and the problem line is line 5 ("if 'E' in question & data_final['EOPS/CARE']..." I'm at my wits end! – pythonnewbiebb Jan 04 '20 at 22:25
  • Can you share your entire program, plus data? As an aside, you should probably use `NaN` for missing data, rather than a bunch of different strings like `"MISSING"` or `"EMPTY"`. – AMC Jan 04 '20 at 22:29
  • As for the error, where is it occurring? Don't you get a traceback as part of the message? – AMC Jan 04 '20 at 22:31
  • Unfortunately I'm not allowed to share the data. I can see if changing missings to ```NaN``` may solve my issue, I'll report back. Thanks for your patience, as I said I'm somewhat new to Python and this is also my first time asking a question on stackoverflow. – pythonnewbiebb Jan 04 '20 at 22:40
  • It doesn't matter if it's not exactly the same data, it can be some dummy/test data. What's important is that anyone on here be able to reproduce your program exactly. – AMC Jan 04 '20 at 22:41
  • Oops! I should have added that no, changing the appropriate values to `NaN` probably won't resolve this. – AMC Jan 04 '20 at 23:08
  • The problem is you are trying to compare the list item with an entire column's value. i.e. it will have True and False as per the row. You need to specify whether the entire column should have the condition true or false. If list_of_columns is also in the df use a .loc for the conditions and then do what is required. – Sid Jan 05 '20 at 01:16

1 Answers1

0

This is just a quick way to get the columns to where you want them.

# as indicated in your question list_of_columns is also a column in df

df.loc[(df['list_of_columns'].str.contains('E')) & (df['EOPS/CARE'] == 1) & (df['Column name where empty would be present'] == 'Empty'),'Column Name where Empty would be present'] = 'Missing'

Do the same thing to get the other condition to work. I can't understand the rest of the question but if you clarify I can help further.

.loc will help you the most. Check the docs.

Sid
  • 3,749
  • 7
  • 29
  • 62