0

I'm trying to get the index of 'J' in a string that is similar to myString = "███ ███ J ██" so I use myString.find('J') but it returns a really high value and if I replace '█' by 'M' or another character of the alphabet I get a lower value. I don't really understand what's the cause of that.

Aereaux
  • 845
  • 1
  • 8
  • 20
mel
  • 2,730
  • 8
  • 35
  • 70

3 Answers3

2

Try doing myString = u"███ ███ J ██". This will make it a Unicode string instead of the python 2.x default of an ASCII string.

If you are reading it from a file or a file-like object, instead of doing file.read(), do file.read().encode('utf-8-sig').

Aereaux
  • 845
  • 1
  • 8
  • 20
2

To check your encoding run: python -c 'import sys; print(sys.getdefaultencoding())'

For Python 2.x the output is ascii and this is a default encoding for your programs. To use some non-ascii characters developers predicted a unicode() type. See for yourself. Just create a variable myString = u"███ ███ J ██" and follow on it .find('J') method. This u prefix says to interpreter that it deals with Unicode-encoded string. Then you can use this variable like if it was normal str.

I've used Unicode in some places where I should write UTF-8. For difference check this great answer if you want to.

Unicode is a default encoding in Python 3.x, so this problem does not occur.

Community
  • 1
  • 1
kamarkiewicz
  • 116
  • 1
  • 4
  • and what about if my string from a file: myFile = open("map/map1", "r") myMap = (myFiler.read()) – mel Jun 09 '15 at 23:07
  • Use `myFile = open("map/map1", "r")` `myMap = (myFiler.read().encode('utf-8-sig'))` – Aereaux Jun 09 '15 at 23:53
0

Check the settings of the console/ssh client you are using. Set it to be UTF-8.

Zoran Pavlovic
  • 1,166
  • 2
  • 23
  • 38