218

I'm trying to replace the values in one column of a dataframe. The column ('female') only contains the values 'female' and 'male'.

I have tried the following:

w['female']['female']='1'
w['female']['male']='0' 

But receive the exact same copy of the previous results.

I would ideally like to get some output which resembles the following loop element-wise.

if w['female'] =='female':
    w['female'] = '1';
else:
    w['female'] = '0';

I've looked through the gotchas documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html) but cannot figure out why nothing happens.

Any help will be appreciated.

cs95
  • 379,657
  • 97
  • 704
  • 746
Black
  • 4,483
  • 8
  • 38
  • 55

16 Answers16

357

If I understand right, you want something like this:

w['female'] = w['female'].map({'female': 1, 'male': 0})

(Here I convert the values to numbers instead of strings containing numbers. You can convert them to "1" and "0", if you really want, but I'm not sure why you'd want that.)

The reason your code doesn't work is because using ['female'] on a column (the second 'female' in your w['female']['female']) doesn't mean "select rows where the value is 'female'". It means to select rows where the index is 'female', of which there may not be any in your DataFrame.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • 7
    Thanks. Exactly what I was looking for. If I were to map 'female' to 1 and anything else to '0'. How would that work? – Black Apr 26 '14 at 07:47
  • 40
    use this only, if all values in column are given in map function.Column values not specified in map function will be replaced by nan. – Chandra Mar 22 '17 at 18:56
  • 2
    I would also recommend using the `.loc` syntax to avoid `SettingWithCopyWarning`: https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing – NickBraunagel Mar 08 '18 at 20:15
  • 3
    instead of .map I used .replace – JS noob Sep 25 '18 at 15:48
  • How do i get rid of the '.' of the thousands on two or more columns, cannot figure out. thank you so much – M. Mariscal Feb 10 '20 at 10:25
  • @M.Mariscal you may use `df.replace({'.':None})`. Or, better if you have '.' representing missing values in the data before reading it, you may specify `na_values` in `read_excel` or `read_csv` commands in pandas such as `pd.read_csv('data.csv', na_values='.')` – MECoskun Oct 09 '20 at 15:13
  • Thi solution works but you need to specify all the keys in the column. The unspecified keys will be given Nan value. – cerebrou Sep 20 '21 at 16:08
  • @Chandra You can avoid the remaining values from beeing changed into nan by simply using `.replace` instead of `.map` – Jorgo Feb 02 '23 at 14:56
  • I'm showing up 9 years late here, but for anyone reading this, @Black's question in the comments above can be answered using @Chandra's comment even though it wasn't meant as a response to @Black: ```w['female'] = w['female'].map({'female': 1}).fillna(0)``` – LuminosityXVII Aug 23 '23 at 09:16
148

You can edit a subset of a dataframe by using loc:

df.loc[<row selection>, <column selection>]

In this case:

w.loc[w.female != 'female', 'female'] = 0
w.loc[w.female == 'female', 'female'] = 1
Jimmy Petersson
  • 1,611
  • 1
  • 10
  • 3
  • 1
    How would I adapt it so I don't need to select specific rows via a condition, just all rows in a particular column? So change all cells in a column to a particular value. – Dhruv Ghulati Sep 08 '16 at 15:20
  • 4
    @DhruvGhulati, you would use df.loc[:, ] –  Dec 18 '16 at 23:41
68
w.female.replace(to_replace=dict(female=1, male=0), inplace=True)

See pandas.DataFrame.replace() docs.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
53

Slight variation:

w.female.replace(['male', 'female'], [1, 0], inplace=True)
deckard
  • 852
  • 10
  • 15
26

This should also work:

w.female[w.female == 'female'] = 1 
w.female[w.female == 'male']   = 0
Nick Crawford
  • 5,086
  • 2
  • 21
  • 20
18

This is very compact:

w['female'][w['female'] == 'female']=1
w['female'][w['female'] == 'male']=0

Another good one:

w['female'] = w['female'].replace(regex='female', value=1)
w['female'] = w['female'].replace(regex='male', value=0)
Azz
  • 301
  • 3
  • 8
  • 2
    The first example is chained indexing and is warned against as it cannot guarantee whether the resulting df is a copy or a view. See [chained-indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label) – Nordle Jul 24 '18 at 11:00
12

You can also use apply with .get i.e.

w['female'] = w['female'].apply({'male':0, 'female':1}.get):

w = pd.DataFrame({'female':['female','male','female']})
print(w)

Dataframe w:

   female
0  female
1    male
2  female

Using apply to replace values from the dictionary:

w['female'] = w['female'].apply({'male':0, 'female':1}.get)
print(w)

Result:

   female
0       1
1       0
2       1 

Note: apply with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.

niraj
  • 17,498
  • 4
  • 33
  • 48
10

Using Series.map with Series.fillna

If your column contains more strings than only female and male, Series.map will fail in this case since it will return NaN for other values.

That's why we have to chain it with fillna:

Example why .map fails:

df = pd.DataFrame({'female':['male', 'female', 'female', 'male', 'other', 'other']})

   female
0    male
1  female
2  female
3    male
4   other
5   other
df['female'].map({'female': '1', 'male': '0'})

0      0
1      1
2      1
3      0
4    NaN
5    NaN
Name: female, dtype: object

For the correct method, we chain map with fillna, so we fill the NaN with values from the original column:

df['female'].map({'female': '1', 'male': '0'}).fillna(df['female'])

0        0
1        1
2        1
3        0
4    other
5    other
Name: female, dtype: object
Erfan
  • 40,971
  • 8
  • 66
  • 78
9

Alternatively there is the built-in function pd.get_dummies for these kinds of assignments:

w['female'] = pd.get_dummies(w['female'],drop_first = True)

This gives you a data frame with two columns, one for each value that occurs in w['female'], of which you drop the first (because you can infer it from the one that is left). The new column is automatically named as the string that you replaced.

This is especially useful if you have categorical variables with more than two possible values. This function creates as many dummy variables needed to distinguish between all cases. Be careful then that you don't assign the entire data frame to a single column, but instead, if w['female'] could be 'male', 'female' or 'neutral', do something like this:

w = pd.concat([w, pd.get_dummies(w['female'], drop_first = True)], axis = 1])
w.drop('female', axis = 1, inplace = True)

Then you are left with two new columns giving you the dummy coding of 'female' and you got rid of the column with the strings.

galliwuzz
  • 369
  • 4
  • 14
8
w.replace({'female':{'female':1, 'male':0}}, inplace = True)

The above code will replace 'female' with 1 and 'male' with 0, only in the column 'female'

user41855
  • 917
  • 8
  • 15
3

There is also a function in pandas called factorize which you can use to automatically do this type of work. It converts labels to numbers: ['male', 'female', 'male'] -> [0, 1, 0]. See this answer for more information.

Roald
  • 2,459
  • 16
  • 43
2
w.female = np.where(w.female=='female', 1, 0)

if someone is looking for a numpy solution. This is useful to replace values based on a condition. Both if and else conditions are inherent in np.where(). The solutions that use df.replace() may not be feasible if the column included many unique values in addition to 'male', all of which should be replaced with 0.

Another solution is to use df.where() and df.mask() in succession. This is because neither of them implements an else condition.

w.female.where(w.female=='female', 0, inplace=True) # replace where condition is False
w.female.mask(w.female=='female', 1, inplace=True) # replace where condition is True
Achintha Ihalage
  • 2,310
  • 4
  • 20
  • 33
1
dic = {'female':1, 'male':0}
w['female'] = w['female'].replace(dic)

.replace has as argument a dictionary in which you may change and do whatever you want or need.

ebuitragod
  • 11
  • 1
0

I think that in answer should be pointed which type of object do you get in all methods suggested above: is it Series or DataFrame.

When you get column by w.female. or w[[2]] (where, suppose, 2 is number of your column) you'll get back DataFrame. So in this case you can use DataFrame methods like .replace.

When you use .loc or iloc you get back Series, and Series don't have .replace method, so you should use methods like apply, map and so on.

Alex-droid AD
  • 635
  • 1
  • 6
  • 14
0

To answer the question more generically so it applies to more use cases than just what the OP asked, consider this solution. I used jfs's solution solution to help me. Here, we create two functions that help feed each other and can be used whether you know the exact replacements or not.

import numpy as np
import pandas as pd


class Utility:

    @staticmethod
    def rename_values_in_column(column: pd.Series, name_changes: dict = None) -> pd.Series:
        """
        Renames the distinct names in a column. If no dictionary is provided for the exact name changes, it will default
        to <column_name>_count. Ex. female_1, female_2, etc.

        :param column: The column in your dataframe you would like to alter.
        :param name_changes: A dictionary of the old values to the new values you would like to change.
        Ex. {1234: "User A"} This would change all occurrences of 1234 to the string "User A" and leave the other values as they were.
        By default, this is an empty dictionary.
        :return: The same column with the replaced values
        """
        name_changes = name_changes if name_changes else {}
        new_column = column.replace(to_replace=name_changes)
        return new_column

    @staticmethod
    def create_unique_values_for_column(column: pd.Series, except_values: list = None) -> dict:
        """
        Creates a dictionary where the key is the existing column item and the value is the new item to replace it.
        The returned dictionary can then be passed the pandas rename function to rename all the distinct values in a
        column.
        Ex. column ["statement"]["I", "am", "old"] would return
        {"I": "statement_1", "am": "statement_2", "old": "statement_3"}

        If you would like a value to remain the same, enter the values you would like to stay in the except_values.
        Ex. except_values = ["I", "am"]
        column ["statement"]["I", "am", "old"] would return
        {"old", "statement_3"}

        :param column: A pandas Series for the column with the values to replace.
        :param except_values: A list of values you do not want to have changed.
        :return: A dictionary that maps the old values their respective new values.
        """
        except_values = except_values if except_values else []
        column_name = column.name
        distinct_values = np.unique(column)
        name_mappings = {}
        count = 1
        for value in distinct_values:
            if value not in except_values:
                name_mappings[value] = f"{column_name}_{count}"
                count += 1
        return name_mappings

For the OP's use case, it is simple enough to just use

w["female"] = Utility.rename_values_in_column(w["female"], name_changes = {"female": 0, "male":1}

However, it is not always so easy to know all of the different unique values within a data frame that you may want to rename. In my case, the string values for a column are hashed values so they hurt the readability. What I do instead is replace those hashed values with more readable strings thanks to the create_unique_values_for_column function.

df["user"] = Utility.rename_values_in_column(
    df["user"],
    Utility.create_unique_values_for_column(df["user"])
)

This will changed my user column values from ["1a2b3c", "a12b3c","1a2b3c"] to ["user_1", "user_2", "user_1]. Much easier to compare, right?

DarkHark
  • 614
  • 1
  • 6
  • 20
0

If you have only two classes you can use equality operator. For example:

df = pd.DataFrame({'col1':['a', 'a', 'a', 'b']})

df['col1'].eq('a').astype(int)
# (df['col1'] == 'a').astype(int)

Output:

0    1
1    1
2    1
3    0
Name: col1, dtype: int64
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73