I'm working on Titanic dataset available on Kaggle:
https://www.kaggle.com/c/titanic/data
I'm trying to handle all the titles contained within the name of a passenger.
I'm able to filter using 'contains' method as below to display the values:
train[~train.Name.str.contains('Mr.|Mrs.|Miss.|Master.|Dr.|Rev.|Jonkheer.|Countess.|Major.|Col.|Capt.|Don.|Mme.|Mlle.')]['Name']
and it displays what I haven't yet captured:
443 Reynaldo, Ms. Encarnacion
Name: Name, dtype: object
So I created a mapper function to create another feature:
## title mapper function
def title_mapper(x):
if x.contains('Mr.'):
return 'Mr'
elif x.contains('Mrs.|Mme.'):
return 'Mrs'
elif x.contains('Miss.|Mlle.'):
return 'Miss'
elif x.contains('Dr.'):
return 'Dr'
elif x.contains('Rev.'):
return 'Cleric'
elif x.contains('Jonkheer.|Countess.|Don.|Ms.'):
return 'Noble'
elif x.contains('Major.|Col.|Capt.'):
return 'Military'
else:
return 'Other'
But it claims there there is no attribute contains:
train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-63-7c9804f87141> in <module>
20 return 'Other'
21
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3589 else:
3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
3592
3593 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-63-7c9804f87141> in <lambda>(x)
20 return 'Other'
21
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
<ipython-input-63-7c9804f87141> in title_mapper(x)
3 ## title mapper function
4 def title_mapper(x):
----> 5 if x.contains('Mr.'):
6 return 'Mr'
7 elif x.contains('Mrs.|Mme.'):
AttributeError: 'str' object has no attribute 'contains'
Reviewed this question and answer and adjusted:
Does Python have a string 'contains' substring method?
But as I understand you cannot pass multiple patterns like this even if the string has preceding r''. Using Python 3.7
'Capt.|Col.'
Only when hard coded each value it worked, but is there a way to do it better / more efficient?
## title mapper function
def title_mapper(x):
if 'Mr.' in x:
return 'Mr'
elif 'Mrs.' in x:
return 'Mrs'
elif 'Mme.' in x:
return 'Mrs'
elif 'Miss.' in x:
return 'Miss'
elif 'Mlle.' in x:
return 'Miss'
elif 'Dr.' in x:
return 'Dr'
elif 'Rev.' in x:
return 'Cleric'
elif 'Jonkheer.' in x:
return 'Noble'
elif 'Countess.' in x:
return 'Noble'
elif 'Don.' in x:
return 'Noble'
elif 'Ms.' in x:
return 'Noble'
elif 'Major.' in x:
return 'Military'
elif 'Col.' in x:
return 'Military'
elif 'Capt.' in x:
return 'Military'
else:
return 'Other'
train['Title'] = train['Name'].apply(lambda x: title_mapper(x))