Removing b'' from string column in a pandas dataframe

Question

I have a data frame as taken from SDSS database. Example data is here.

I want to remove the character 'b' from data['class']. I tried

data['class'] = data['class'].replace("b','')

But I am not getting the result.

Please do not post screenshots. They are really not helping. Instead share data with the simple command: df.head().to_dict() as an example. -1 — Anton vBR, Oct 11 '17 at 20:49

score 32 · Answer 1 · answered Oct 11 '17 at 20:17

32

You're working with byte strings. You might consider str.decode:

data['class'] = data['class'].str.decode('utf-8')

answered Oct 11 '17 at 20:17

cs95

379,657
97
704
746

2

@cᴏʟᴅsᴘᴇᴇᴅ I smashed that up arrow for you. – Scott Boston Oct 11 '17 at 20:30
@ScottBoston Lol, that made me chuckle. Thanks ;-) – cs95 Oct 11 '17 at 20:31
1

Although true, the key here is the .str method for pd.series. The decode is just a normal string function. Anyway +1 :) – Anton vBR Oct 11 '17 at 20:42
2

When smashing... try not to break it! – piRSquared Oct 11 '17 at 20:47
2

I do not like number 4 , so I make it to 5 – BENY Oct 11 '17 at 20:47
Question: is their a way to apply this for all columns in a dataframe? – Jonnyboi Sep 07 '21 at 18:00
@Jonnyboi `df.apply` comes to mind – cs95 Oct 01 '21 at 11:06

score 1 · Answer 2 · answered Oct 11 '17 at 20:46

Further explanation:

df = pd.DataFrame([b'123']) # create dataframe with b'' element

Now we can call

df[0].str.decode('utf-8') # returns a pd.series applying decode on str succesfully
df[0].decode('utf-8') # tries to decode the series and throws an error

Basically what you are doing with .str() is applying it for all elements. It could also be written like this:

df[0].apply(lambda x: x.decode('utf-8'))

Removing b'' from string column in a pandas dataframe

2 Answers2

Linked