49

I have a DataFrame named df as

  Order Number       Status
1         1668  Undelivered
2        19771  Undelivered
3    100032108  Undelivered
4         2229    Delivered
5        00056  Undelivered

I would like to convert the Status column to boolean (True when Status is Delivered and False when Status is Undelivered) but if Status is neither 'Undelivered' neither 'Delivered' it should be considered as NotANumber or something like that.

I would like to use a dict

d = {
  'Delivered': True,
  'Undelivered': False
}

so I could easily add other string which could be either considered as True or False.

smci
  • 32,567
  • 20
  • 113
  • 146
working4coins
  • 1,997
  • 3
  • 22
  • 30

4 Answers4

65

You can just use map:

In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                     'SomethingElse']})

In [8]: df
Out[8]:
          Status
0      Delivered
1      Delivered
2    Undelivered
3  SomethingElse

In [9]: d = {'Delivered': True, 'Undelivered': False}

In [10]: df['Status'].map(d)
Out[10]:
0     True
1     True
2    False
3      NaN
Name: Status, dtype: object
joris
  • 133,120
  • 36
  • 247
  • 202
19

An example of replace method to replace values only in the specified column C2 and get result as DataFrame type.

import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})

  C1 C2
0  X  Y
1  Y  Y
2  X  X
3  Y  X

df.replace({'C2': {'X': True, 'Y': False}})

  C1     C2
0  X  False
1  Y  False
2  X   True
3  Y   True
Kappa Leonis
  • 661
  • 1
  • 6
  • 10
  • 2
    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Donald Duck Mar 18 '17 at 08:53
13

Expanding on the previous answers:

Map method explained:

  • Pandas will lookup each row's value in the corresponding d dictionary, replacing any found keys with values from d.
  • Values without keys in d will be set as NaN. This can be corrected with fillna() methods.
  • Does not work on multiple columns, since pandas operates through serialization of pd.Series here.
  • Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)

Replace method explained:

  • Pandas will lookup each row's value in the corresponding d dictionary, and attempt to replace any found keys with values from d.
  • Values without keys in d will be be retained.
  • Works with single and multiple columns (pd.Series or pd.DataFrame objects).
  • Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)

Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
7

You've got everything you need. You'll be happy to discover replace:

df.replace(d)
Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • Ah, I only see it now I posted my answer. Is there a difference with `map` in this case? – joris Jul 17 '13 at 14:41
  • It seems that something else (not in the dift) is just left with `replace`, but converted to `NaN` with `map` – joris Jul 17 '13 at 14:46
  • 3
    I think ``map`` is a better choice here, actually, because if a value isn't in ``d`` then the value is invalid and should be replaced with ``NaN``. – Dan Allan Jul 17 '13 at 14:46
  • `replace` seems to apply to DataFrame not to a Serie – working4coins Jul 17 '13 at 15:21
  • It applies to both. My link was to the DataFrame documentation; here's one for Series. http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Series.replace.html – Dan Allan Jul 17 '13 at 15:33