I am new to ML and Data Science (recently graduated from Master's in Business Analytics) and learning as much as I can by myself now while looking for positions in Data Science / Business Analytics.
I am working on my personal project to build ML algorithms to predict if a customer will show up to their existing appointment.
Upon initial data analysis, I notice that my "No-show" column contains values "Yes" and "No" (Metadata: if a customer scheduled an appointment and showed up for an appointment, the value in "No-show" column is "No"; if a customer scheduled an appointment and did not show up for an appointment, "No-show" column value is "Yes"). For ML algorithms, I need values "Yes" to become "1", and values "No" become "0".
I realize that there are 2 ways to tackle this problem:
- write a code to change values of "No-show" column
- create a new "Outcome" column whose values will depend on values in "No-show"
I tried writing code for both cases, but I continue to get different errors. Below are the 2 methods I attempted, and neither work. I appreciate your help in advance!
1.
if my_df["No-show"] == "Yes":
my_df["No-show"] == 1
elif my_df["Outcome"] == "No":
my_df["No-show"] == 0
else:
print("Something went wrong")
Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2.1
my_df["Outcome"] = 0
if my_df["No-show"] == "Yes":
my_df["Outcome"] == 1
elif my_df["No-show"] == "No":
my_df["Outcome"] == 0
else:
print("Something went wrong")
Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2.2
my_df["Outcome"] = 0
for val in my_df.iterrows():
if my_df["No-show"] == "Yes":
my_df["Outcome"] == 1
elif my_df["No-show"] == "No":
my_df["Outcome"] == 0
else:
print("Something went wrong")
Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thank you for your help, and congratulate me on my first question on StackOverflow! Looking forward to give back to community! :)