0

I am trying to remove all b'' from my dataframes ( i.e. b'stackoverflow' to stackoverflow).

I came across Removing b'' from string column in a pandas dataframe however it just mentions doing this to one column.

Is there a way to apply this to all my columns in my dataframe?

Note: all my columns are object types.

I have tried:

df = df.astype(str)
df = df.str.decode('utf-8') 
Mohammad
  • 3,276
  • 2
  • 19
  • 35
Jonnyboi
  • 505
  • 5
  • 19

2 Answers2

2

you can use the following:

df.apply(lambda x: x.str.decode('utf-8'))
Mohammad
  • 3,276
  • 2
  • 19
  • 35
2

You must have mixed types of data in your df. First you need to select those "bytes" columns:

>>> import pandas as pd
>>> df = pd.DataFrame({"a": [b"aa", b"ab"], "b": [b"ba", b"bb"], "c": [1.1, 1.2]})
>>> df
         a        b         c
  <object> <object> <float64>
0    b'aa'    b'ba'       1.1
1    b'ab'    b'bb'       1.2

>>> bytes_cols = df.applymap(lambda col: isinstance(col, bytes)).all(0)
>>> bytes_cols = df.columns[bytes_cols]
>>> bytes_cols
Index(['a', 'b'], dtype='object')

Then only convert those columns:

>>> df.loc[:, bytes_cols] = df[bytes_cols].applymap(lambda col: col.decode("utf-8", errors="ignore"))
>>> df
         a        b         c
  <object> <object> <float64>
0       aa       ba       1.1
1       ab       bb       1.2
Panwen Wang
  • 3,573
  • 1
  • 18
  • 39
  • getting error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 14: invalid start byte` on line `df.loc[:, bytes_cols] = df[bytes_cols].applymap(lambda col: col.decode("utf-8"))` – Jonnyboi Sep 08 '21 at 00:32
  • @Jonnyboi Then you have some characters that cannot be decoded into 'utf-8'. You may try: `col.decode("utf-8", errors='ignore')` (see my updated answer) – Panwen Wang Sep 08 '21 at 01:27
  • thanks it works! however there are a couple of columns it didnt work on - its columns that have blank records in some rows, including the row after the header. – Jonnyboi Sep 08 '21 at 13:05