4

Consider this:

s = u"おはよう"
print len(s)
for c in s: print c

The output is

4
お
は
よ
う

which is what I expect

Now with emojis:

s = u"hi "

Output is

5
h
i

????
????

Why is that? How can I fix it? I have looked at various links before but can't get my head around it Ideally I would like a solution that works both for japanese AND emoticons but if it is for ascii and emoticons only Im fine with it too

Thomas
  • 8,306
  • 8
  • 53
  • 92
  • 2
    might be a version issue. works fine in python 3.5 – Mohammad Athar Feb 08 '17 at 12:44
  • @user2539738 Well, Unicode handling is rather different in Python 3 vs Python 2. However, it works fine for me on Python 2.6.6. – PM 2Ring Feb 08 '17 at 12:45
  • Are you using Windows? Is it 64 bit or 32 bit Python? – PM 2Ring Feb 08 '17 at 12:46
  • Mac, python 2.7.10 – Thomas Feb 08 '17 at 12:47
  • 3
    It sounds like you have a narrow build. Please see [Python returns length of 2 for single Unicode character string](http://stackoverflow.com/a/29109996/4014959) for more info. – PM 2Ring Feb 08 '17 at 12:48
  • 1
    Anyoway, thh advice is to upgrade to use Python 3.5 or 3.6 - there is no need to use an ancient version as Python 2.7 for this kind of work, and doubly so if you keep in mind that easier working with unicode is one of the strenghts of Python3.x series – jsbueno Feb 08 '17 at 12:52
  • (For example, te Python 3.x series no longer has the "narrow build" vs "wide build" distinction which is the likely cause of your reported problem) – jsbueno Feb 08 '17 at 12:53
  • 5
    I have installed python 3.x and it works fine. took me for ever to find a good reason to do the switch. Thanks guys – Thomas Feb 08 '17 at 12:54
  • 2
    Well done, Thomas! It'll take you a little while to get used to the differences, but once you do, you'll wonder how you ever tolerated Python 2's string / Unicode madness. :) – PM 2Ring Feb 08 '17 at 12:57
  • 1

0 Answers0