1

I am just trying on with Pandas and am not sure why I am not getting the proper output (titanic dataset from seaborn).

Status columns should show "f" where it says female in column "sex". (Image Attached)

The picture of the dataframe

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
raj kumar
  • 19
  • 2

3 Answers3

3

The mistake is in the assignment data['Status'] = 'm'. You set all values of this column to m. To correct this and follow your approach, you can iterate through the column using:

for index in range(data.shape[0]):
    if data.loc[index,'sex'] == 'male': 
        data.loc[index,'Status'] = 'm'
    else:
        data.loc[index,'Status'] = 'f'

There is another efficient solution using map:

val_dict= {'female': 'f', 'male': 'm'}
df['status'] = df['sex'].map(val_dict)
Hamzah
  • 8,175
  • 3
  • 19
  • 43
2

A good solution has been provided by @Phoenix based on what you have tried so far. But in the case of an if-else condition, you can use numpy.where function too:

import numpy as np
data["Status"] = np.where(data["sex"] == "female", "f", "m")
TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
1

I generally prefer a more 'pythonic' approach to solve this, with .loc method

df.loc[df['sex'] == 'female', 'status'] = 'f'  
df.loc[df['sex'] == 'male', 'status'] = 'm'

Another approach is with a lamba functions

df['Status'] = df['sex'].apply(lambda x: 'm' if x == 'Male' else 'f')

You should avoid ugly for-loops if possible, in fact, the reason of using Pandas DF is indeed avoiding to use for-loops etc...

If you care about performances, I believe .loc is faster. Ciao!

Lorenzo Bassetti
  • 795
  • 10
  • 15