Pandas - Remove b"" from dataframe

Question

I am trying to remove all b'' from my dataframes ( i.e. b'stackoverflow' to stackoverflow).

I came across Removing b'' from string column in a pandas dataframe however it just mentions doing this to one column.

Is there a way to apply this to all my columns in my dataframe?

Note: all my columns are object types.

I have tried:

df = df.astype(str)
df = df.str.decode('utf-8')

hey, i'm getting error `AttributeError: 'float' object has no attribute 'decode'` — Jonnyboi, Sep 07 '21 at 20:00
sorry, I didnt have both in there. but now I get `AttributeError: 'str' object has no attribute 'decode'` — Jonnyboi, Sep 07 '21 at 20:42

score 2 · Answer 1 · answered Sep 07 '21 at 20:05

2

you can use the following:

df.apply(lambda x: x.str.decode('utf-8'))

answered Sep 07 '21 at 20:05

Mohammad

3,276
2
19
35

thanks, still have the `b'stackoverflow'` though. I am using df= df.astype(str) before your line. – Jonnyboi Sep 07 '21 at 20:45
@Jonnyboi Can you provide some sample data? – Mohammad Sep 07 '21 at 20:50
1

@Jonnyboi just a sanity check, make sure to assign back or the changes won't be saved: `df = df.apply(...` – tdy Sep 07 '21 at 21:45
ok this works, but seems to convert everything into `NaN`. – Jonnyboi Sep 07 '21 at 23:58

Panwen Wang · Answer 2 · 2021-09-08T01:28:58.727

2

You must have mixed types of data in your df. First you need to select those "bytes" columns:

>>> import pandas as pd
>>> df = pd.DataFrame({"a": [b"aa", b"ab"], "b": [b"ba", b"bb"], "c": [1.1, 1.2]})
>>> df
         a        b         c
  <object> <object> <float64>
0    b'aa'    b'ba'       1.1
1    b'ab'    b'bb'       1.2

>>> bytes_cols = df.applymap(lambda col: isinstance(col, bytes)).all(0)
>>> bytes_cols = df.columns[bytes_cols]
>>> bytes_cols
Index(['a', 'b'], dtype='object')

Then only convert those columns:

>>> df.loc[:, bytes_cols] = df[bytes_cols].applymap(lambda col: col.decode("utf-8", errors="ignore"))
>>> df
         a        b         c
  <object> <object> <float64>
0       aa       ba       1.1
1       ab       bb       1.2

edited Sep 08 '21 at 01:28

answered Sep 07 '21 at 21:40

Panwen Wang

3,573
1
18
39

getting error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 14: invalid start byte` on line `df.loc[:, bytes_cols] = df[bytes_cols].applymap(lambda col: col.decode("utf-8"))` – Jonnyboi Sep 08 '21 at 00:32
@Jonnyboi Then you have some characters that cannot be decoded into 'utf-8'. You may try: `col.decode("utf-8", errors='ignore')` (see my updated answer) – Panwen Wang Sep 08 '21 at 01:27
thanks it works! however there are a couple of columns it didnt work on - its columns that have blank records in some rows, including the row after the header. – Jonnyboi Sep 08 '21 at 13:05

Pandas - Remove b"" from dataframe

2 Answers2