-1

I need to replace special characters or i need to extract string within special quotes. I tried already df.replace but its not working.

I have df like this

b'rgcr8fpzpx1s7x4a'
b'ue98rkzajy64hrbw'
b'u1u5ucr56y9d8rn4'

I need to get output like this:

rgcr8fpzpx1s7x4a
 ue98rkzajy64hrbw
 u1u5ucr56y9d8rn4
s_khan92
  • 969
  • 8
  • 21

2 Answers2

4

I would use extract with a regex:

df[0].str.extract("b\'(.*)\'")

Output:

                  0
0  rgcr8fpzpx1s7x4a
1  ue98rkzajy64hrbw
2  u1u5ucr56y9d8rn4
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thanks for the answer. I am just curuious that when ever i try to use this into my main dataframe then i loose all the values. For example you said column name '0' so it should be `df[0] = df[0].str.extract("b\'(.*)\'")` but it removed my all the data... am i doing any mistake? – s_khan92 May 20 '20 at 15:10
  • You need to replace the 0 with your actual column name in your dataframe. I just copied the data from this question and createded a dataframe with the first column of 0. Hence, I am using 0. Please use your actual column header in you dataframe. – Scott Boston May 20 '20 at 15:24
  • yes ofcourse that i know and i am using my own column name :) But once i use it, i lose my all the data... Its converted the all values 'NaN' in that column. This is what i used `df['uuid: Participant identifier'] = df['uuid: Participant identifier'].str.extract("b\'(.*)\'")` – s_khan92 May 20 '20 at 15:44
  • You can use `fillna(df['uuid: Participant identifier'])` to back fill those Nan with the original values that don't match that pattern. You might need to adjust the regex to get your data match. I am using what you have in this question as a test. – Scott Boston May 20 '20 at 15:45
1

Those values seems to be like byte string, try converting to str.

df['col'] = df['col'].apply(lambda x : x.decode())

0    rgcr8fpzpx1s7x4a
1    ue98rkzajy64hrbw
2    u1u5ucr56y9d8rn4
Name: col, dtype: object
sushanth
  • 8,275
  • 3
  • 17
  • 28