How to change column values or create values in a new column based on values in existing column?

Question

I am new to ML and Data Science (recently graduated from Master's in Business Analytics) and learning as much as I can by myself now while looking for positions in Data Science / Business Analytics.

I am working on my personal project to build ML algorithms to predict if a customer will show up to their existing appointment.

Upon initial data analysis, I notice that my "No-show" column contains values "Yes" and "No" (Metadata: if a customer scheduled an appointment and showed up for an appointment, the value in "No-show" column is "No"; if a customer scheduled an appointment and did not show up for an appointment, "No-show" column value is "Yes"). For ML algorithms, I need values "Yes" to become "1", and values "No" become "0".

I realize that there are 2 ways to tackle this problem:

write a code to change values of "No-show" column
create a new "Outcome" column whose values will depend on values in "No-show"

I tried writing code for both cases, but I continue to get different errors. Below are the 2 methods I attempted, and neither work. I appreciate your help in advance!

1.

if my_df["No-show"] == "Yes":
    my_df["No-show"] == 1
elif my_df["Outcome"] == "No":
    my_df["No-show"] == 0
else:
    print("Something went wrong")

Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2.1

my_df["Outcome"] = 0

if my_df["No-show"] == "Yes":
    my_df["Outcome"] == 1
elif my_df["No-show"] == "No":
    my_df["Outcome"] == 0
else:
    print("Something went wrong")

Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2.2

my_df["Outcome"] = 0

for val in my_df.iterrows():
    if my_df["No-show"] == "Yes":
        my_df["Outcome"] == 1
    elif my_df["No-show"] == "No":
        my_df["Outcome"] == 0
    else:
        print("Something went wrong")

Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thank you for your help, and congratulate me on my first question on StackOverflow! Looking forward to give back to community! :)

Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) — AMC, Mar 25 '20 at 18:38
Have you looked at the [Pandas docs](https://pandas.pydata.org/docs/)? — AMC, Mar 25 '20 at 18:39
@Arsik36 One issue is that you aren't using `val` in the for loop. Also, there's a more succinct way to do it: `my_df['Outcome'] = (my_df['No-show'] == 'Yes').astype(int)` — Chris, Mar 25 '20 at 18:39
As an aside, having a variable `"No-show"` is going to get confusing. Also, why use 0/1 instead of actual bools? — AMC, Mar 25 '20 at 18:40
@Chris Thank you, this helped! One beginner-style question; how does Python now to translate "Yes" value to 1 in a new column, since you didn't explicitly specify that in your code? Thank you for your help! — Arsik36, Mar 25 '20 at 18:51
@AMC thank you for your feedback. This is exactly why I am creating a new "Outcome" column to avoid having weird signs in column names — Arsik36, Mar 25 '20 at 18:51
`my_df['No-show'] == 'Yes'` will return a `pandas.Series` of the same length (number of rows) as `my_df`, with `True` or `False` for each row depending on whether the `No-show` column matches the given string (`'Yes'` in this case). Using the `astype()` method converts those boolean values to integers (`True` becomes `1` and `False` becomes `0`). This is assigned via the usual `=` operator to `my_df['Outcome']`, which creates the new column in the dataframe. As @AMC suggested, you will do well to read the pandas documentation – it's really clear and concise. — Chris, Mar 25 '20 at 18:55

How to change column values or create values in a new column based on values in existing column?

0 Answers0

Linked