I am just trying on with Pandas and am not sure why I am not getting the proper output (titanic dataset from seaborn).
Status columns should show "f" where it says female in column "sex". (Image Attached)
I am just trying on with Pandas and am not sure why I am not getting the proper output (titanic dataset from seaborn).
Status columns should show "f" where it says female in column "sex". (Image Attached)
The mistake is in the assignment data['Status'] = 'm'
. You set all values of this column to m
. To correct this and follow your approach, you can iterate through the column using:
for index in range(data.shape[0]):
if data.loc[index,'sex'] == 'male':
data.loc[index,'Status'] = 'm'
else:
data.loc[index,'Status'] = 'f'
There is another efficient solution using map
:
val_dict= {'female': 'f', 'male': 'm'}
df['status'] = df['sex'].map(val_dict)
A good solution has been provided by @Phoenix based on what you have tried so far. But in the case of an if-else condition, you can use numpy.where
function too:
import numpy as np
data["Status"] = np.where(data["sex"] == "female", "f", "m")
I generally prefer a more 'pythonic' approach to solve this, with .loc
method
df.loc[df['sex'] == 'female', 'status'] = 'f'
df.loc[df['sex'] == 'male', 'status'] = 'm'
Another approach is with a lamba functions
df['Status'] = df['sex'].apply(lambda x: 'm' if x == 'Male' else 'f')
You should avoid ugly for-loops if possible, in fact, the reason of using Pandas DF is indeed avoiding to use for-loops etc...
If you care about performances, I believe .loc is faster. Ciao!