0

I read a lot of articles on this, but still I could not decide and find useful for all to have an answer, especially on Pandas behaviour in Python 2.7.

Is it better to 'recast' in str or in Unicode in Pandas, Python 2.7? (option 1 or option 2)

Option 1:

 df = pd.DataFrame({'b':['ホテ','・旅館', 'ホテル']})
 df= df.astype({ 'b': 'unicode'})

Option 2:

 df = pd.DataFrame({'b':['ホテ','・旅館', 'ホテル']})
 df= df.astype({ 'b': 'str'}) 

Based on references, everything should be put in Unicode before any processing?

References: Python str vs unicode types

Community
  • 1
  • 1
tensor
  • 3,088
  • 8
  • 37
  • 71
  • It depends on what you want to do with the `DataFrame`. For example, if you want to count the number of characters, you'll need `unicode`. If you want to pass the data through a socket, you'll need 'bytes' (Python2 `str`s). For most purposes you'll probably want `unicode` but it's hard to make a blanket statement. – unutbu Jan 07 '17 at 17:10
  • Sounds interesting view. –  Jan 07 '17 at 19:51

0 Answers0