Convert Pandas series containing string to boolean

Question

I have a DataFrame named df as

  Order Number       Status
1         1668  Undelivered
2        19771  Undelivered
3    100032108  Undelivered
4         2229    Delivered
5        00056  Undelivered

I would like to convert the Status column to boolean (True when Status is Delivered and False when Status is Undelivered) but if Status is neither 'Undelivered' neither 'Delivered' it should be considered as NotANumber or something like that.

I would like to use a dict

d = {
  'Delivered': True,
  'Undelivered': False
}

so I could easily add other string which could be either considered as True or False.

score 65 · Accepted Answer · answered Jul 17 '13 at 14:41

65

You can just use map:

In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                     'SomethingElse']})

In [8]: df
Out[8]:
          Status
0      Delivered
1      Delivered
2    Undelivered
3  SomethingElse

In [9]: d = {'Delivered': True, 'Undelivered': False}

In [10]: df['Status'].map(d)
Out[10]:
0     True
1     True
2    False
3      NaN
Name: Status, dtype: object

answered Jul 17 '13 at 14:41

joris

133,120
36
247
202

im getting `AttributeError: 'DataFrame' object has no attribute 'map'`. – gwthm.in Sep 03 '17 at 13:54
`map` is a method on the Series, not DataFrame. – joris Sep 11 '17 at 06:55
yea got it, sorry for that. – gwthm.in Sep 11 '17 at 08:25

Kappa Leonis · Answer 2 · 2017-03-19T07:35:20.040

19

An example of replace method to replace values only in the specified column C2 and get result as DataFrame type.

import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})

  C1 C2
0  X  Y
1  Y  Y
2  X  X
3  Y  X

df.replace({'C2': {'X': True, 'Y': False}})

  C1     C2
0  X  False
1  Y  False
2  X   True
3  Y   True

edited Mar 19 '17 at 07:35

answered Mar 18 '17 at 07:38

Kappa Leonis

661
1
6
10

2

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Donald Duck Mar 18 '17 at 08:53

score 13 · Answer 3 · answered May 10 '20 at 19:03

Expanding on the previous answers:

Map method explained:

Pandas will lookup each row's value in the corresponding d dictionary, replacing any found keys with values from d.
Values without keys in d will be set as NaN. This can be corrected with fillna() methods.
Does not work on multiple columns, since pandas operates through serialization of pd.Series here.
Documentation: pd.Series.map

d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)

Replace method explained:

Pandas will lookup each row's value in the corresponding d dictionary, and attempt to replace any found keys with values from d.
Values without keys in d will be be retained.
Works with single and multiple columns (pd.Series or pd.DataFrame objects).
Documentation: pd.DataFrame.replace

d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)

Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.

score 7 · Answer 4 · answered Jul 17 '13 at 14:38

7

You've got everything you need. You'll be happy to discover replace:

df.replace(d)

answered Jul 17 '13 at 14:38

Dan Allan

34,073
6
70
63

Ah, I only see it now I posted my answer. Is there a difference with `map` in this case? – joris Jul 17 '13 at 14:41
It seems that something else (not in the dift) is just left with `replace`, but converted to `NaN` with `map` – joris Jul 17 '13 at 14:46
3

I think ``map`` is a better choice here, actually, because if a value isn't in ``d`` then the value is invalid and should be replaced with ``NaN``. – Dan Allan Jul 17 '13 at 14:46
`replace` seems to apply to DataFrame not to a Serie – working4coins Jul 17 '13 at 15:21
It applies to both. My link was to the DataFrame documentation; here's one for Series. http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Series.replace.html – Dan Allan Jul 17 '13 at 15:33

Convert Pandas series containing string to boolean

4 Answers4

Map method explained:

Replace method explained:

Linked

Related