I have a Dataframe called df_base
that looks like this. As you can see, there's a column called Sex
that's male
or female
. I want to map these values to 0 and 1, respectively.
+---+-------------+----------+--------+---------------------------------------------------+--------+-----+-------+-------+------------------+---------+-------+----------+
| | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
+---+-------------+----------+--------+---------------------------------------------------+--------+-----+-------+-------+------------------+---------+-------+----------+
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | 0 | A/5 21171 | 7.25 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.925 | NaN | S |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 0 | 113803 | 53.1 | C123 | S |
| 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.05 | NaN | S |
+---+-------------+----------+--------+---------------------------------------------------+--------+-----+-------+-------+------------------+---------+-------+----------+
There's a few methods that I've seen dotted about on StackOverflow but I'm wondering what the most efficient is to perform the following mapping:
+---------+---------+
| Old Sex | New Sex |
+---------+---------+
| male | 0 |
| female | 1 |
| female | 1 |
| female | 1 |
| male | 0 |
+---------+---------+
I'm using this:
df_base['Sex'].replace(['male','female'],[0,1],inplace=True)
... but I can't help but feel as though this is a little shoddy. Is there a better way of doing this? There's also using .loc
but that loops around the rows of the Dataframe, so is less efficient, right?