█ character string indexed in python

Question

I'm trying to get the index of 'J' in a string that is similar to myString = "███ ███ J ██" so I use myString.find('J') but it returns a really high value and if I replace '█' by 'M' or another character of the alphabet I get a lower value. I don't really understand what's the cause of that.

Because it isn't an ASCII character. If, for example, Python uses the common UTF8 encoding scheme for its internal strings, this character will be represented by *three* one-byte codes: `0xE2 0x96 0x88`. — Jongware, Jun 09 '15 at 22:34
Which python version are you using? It could be an issue with unicode handling in python 2.x. — Aereaux, Jun 09 '15 at 22:34
`lower value=-1` because except J there is no other alphabet — Ajay, Jun 09 '15 at 22:35
@Aereaux It is. If you declare it as a Unicode String, i.e. myString = u"███ ███ J ██", find works fine. — Sinkingpoint, Jun 09 '15 at 22:36

Aereaux · Accepted Answer · 2015-06-09T23:56:13.370

2

Try doing myString = u"███ ███ J ██". This will make it a Unicode string instead of the python 2.x default of an ASCII string.

If you are reading it from a file or a file-like object, instead of doing file.read(), do file.read().encode('utf-8-sig').

edited Jun 09 '15 at 23:56

answered Jun 09 '15 at 22:39

Aereaux

845
1
8
20

score 2 · Answer 2 · edited May 23 '17 at 11:50

2

To check your encoding run: python -c 'import sys; print(sys.getdefaultencoding())'

For Python 2.x the output is ascii and this is a default encoding for your programs. To use some non-ascii characters developers predicted a unicode() type. See for yourself. Just create a variable myString = u"███ ███ J ██" and follow on it .find('J') method. This u prefix says to interpreter that it deals with Unicode-encoded string. Then you can use this variable like if it was normal str.

I've used Unicode in some places where I should write UTF-8. For difference check this great answer if you want to.

Unicode is a default encoding in Python 3.x, so this problem does not occur.

edited May 23 '17 at 11:50

Community

1
1

answered Jun 09 '15 at 22:49

kamarkiewicz

116
1
4

and what about if my string from a file: myFile = open("map/map1", "r") myMap = (myFiler.read()) – mel Jun 09 '15 at 23:07
Use `myFile = open("map/map1", "r")` `myMap = (myFiler.read().encode('utf-8-sig'))` – Aereaux Jun 09 '15 at 23:53

score 0 · Answer 3 · answered Jun 09 '15 at 22:50

0

Check the settings of the console/ssh client you are using. Set it to be UTF-8.

answered Jun 09 '15 at 22:50

Zoran Pavlovic

1,166
2
23
38

█ character string indexed in python

3 Answers3