9

How to convert pandas dataframe to unicode?

`messages=pandas.read_csv('data/SMSSpamCollection',sep='\t',quoting=csv.QUOTE_NONE,names=["label", "message"])
def split_into_tokens(message):
  message = unicode(message, 'utf8')  # convert bytes into proper unicode
  return TextBlob(message).words


messages.head().apply(split_into_tokens(messages))`

It gives error

Traceback (most recent call last):
File "minor.py", line 46, in <module>
messages.head().apply(split_into_tokens(messages))
File "minor.py", line 42, in split_into_tokens
message = unicode(message, 'utf8')  # convert bytes into proper unicode
TypeError: coercing to Unicode: need string or buffer, DataFrame found
ADITYA KUMAR
  • 409
  • 2
  • 5
  • 11
  • try messages.head().apply(split_into_tokens) and run and make sure the 'apply' do not work on whole dataframe you need to pass df['column_name'].apply(some_function) – sandepp Feb 25 '17 at 14:59
  • I am adding it as answer then – sandepp Feb 25 '17 at 16:05

2 Answers2

9

Df.x.str.encode('utf-8')

Will fix your problems.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.encode.html

jason m
  • 6,519
  • 20
  • 69
  • 122
2

Change the code

messages.head().apply(split_into_tokens(messages))

to

messages.head().apply(split_into_tokens)

while using 'apply' with a funtion like in your case passing parameters is not required, as your code shows it is passing a dataframe which is giving error on execution.

sandepp
  • 532
  • 2
  • 6
  • 16