1

I'm working on Titanic dataset available on Kaggle:

https://www.kaggle.com/c/titanic/data

I'm trying to handle all the titles contained within the name of a passenger.

I'm able to filter using 'contains' method as below to display the values:

train[~train.Name.str.contains('Mr.|Mrs.|Miss.|Master.|Dr.|Rev.|Jonkheer.|Countess.|Major.|Col.|Capt.|Don.|Mme.|Mlle.')]['Name']

and it displays what I haven't yet captured:

443    Reynaldo, Ms. Encarnacion
Name: Name, dtype: object

So I created a mapper function to create another feature:

## title mapper function
def title_mapper(x):
    if x.contains('Mr.'):
        return 'Mr'
    elif x.contains('Mrs.|Mme.'):
        return 'Mrs'
    elif x.contains('Miss.|Mlle.'):
        return 'Miss'
    elif x.contains('Dr.'):
        return 'Dr'
    elif x.contains('Rev.'):
        return 'Cleric'
    elif x.contains('Jonkheer.|Countess.|Don.|Ms.'):
        return 'Noble'
    elif x.contains('Major.|Col.|Capt.'):
        return 'Military'
    else:
        return 'Other'

But it claims there there is no attribute contains:

train['Title'] = train['Name'].apply(lambda x: title_mapper(x))


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-7c9804f87141> in <module>
     20         return 'Other'
     21 
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))

~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3589             else:
   3590                 values = self.astype(object).values
-> 3591                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3592 
   3593         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-63-7c9804f87141> in <lambda>(x)
     20         return 'Other'
     21 
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))

<ipython-input-63-7c9804f87141> in title_mapper(x)
      3 ## title mapper function
      4 def title_mapper(x):
----> 5     if x.contains('Mr.'):
      6         return 'Mr'
      7     elif x.contains('Mrs.|Mme.'):

AttributeError: 'str' object has no attribute 'contains'

Reviewed this question and answer and adjusted:

Does Python have a string 'contains' substring method?

But as I understand you cannot pass multiple patterns like this even if the string has preceding r''. Using Python 3.7

'Capt.|Col.'

Only when hard coded each value it worked, but is there a way to do it better / more efficient?

## title mapper function
def title_mapper(x):
    if 'Mr.' in x:
        return 'Mr'
    elif 'Mrs.' in x:
        return 'Mrs'
    elif 'Mme.' in x:
        return 'Mrs'
    elif 'Miss.' in x:
        return 'Miss'
    elif 'Mlle.' in x:
        return 'Miss'
    elif 'Dr.' in x:
        return 'Dr'
    elif 'Rev.' in x:
        return 'Cleric'
    elif 'Jonkheer.' in x:
        return 'Noble'
    elif 'Countess.' in x:
        return 'Noble'
    elif 'Don.' in x:
        return 'Noble'
    elif 'Ms.' in x:
        return 'Noble'
    elif 'Major.' in x:
        return 'Military'
    elif 'Col.' in x:
        return 'Military'
    elif 'Capt.' in x:
        return 'Military'
    else:
        return 'Other'

train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
Bartek Malysz
  • 922
  • 5
  • 14
  • 37

1 Answers1

1

If performance is important, use last solution. Also is possible rewite it for dictionary for mapper:

d = {'Mr':['Mr.'],
     'Mrs':['Mrs.',' Mme.'],
     'Miss':['Miss.','Mlle.'],
     'Dr':['Dr.'],
     'Cleric':['Rev.'],
     'Noble':['Jonkheer.','Countess.','Don.','Ms.'],
     'Military': ['Major.','Col.', 'Capt.']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

def title_mapper1(x):
    for k, v in d1.items():
        if k in x:
            return v

train['Title1'] = train['Name'].apply(title_mapper1).fillna('Other')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252