I'm reading a csv as follows.
data = pd.read_csv('news.csv')
It contains news
and category
as columns. I need to tokenize the words in news column.
The problem is that each line of text in news column contains b
at the beginning.
b'Longevity Increase Seen Around the World: WHO'
b'Chikungunya spreading, mosquito-borne virus ...
I tried How do I get rid of the b-prefix in a string in python? but this is for byte encoded string. So,
line = data['news'][0]
line.decode('utf-8')
would cause:
AttributeError: 'str' object has no attribute 'decode'
Each of those lines are of type str
. How do I remove those b's ?