0

I have a simple script parse_dict_pl.py:

# -*- coding: utf-8 -*-

f = open('polish.txt','r')
print "Polish letters: ęóąśłżźćń"
for l in f:
        print l

File polish.txt contains polish letters: ęóąśłżźćń
I run script from windows command line as follows:

python parse_dict_pl.py

and the result is:

Polish letters: ęóąśłżźćń
ęóąśłżźćń

How can I properly print polish letters hardcoded in the script and loaded from file?

Pawel

psmith
  • 1,769
  • 5
  • 35
  • 60
  • I ran the print statement in a linux environment and it prints out the letters correctly. peraps try to make sure that it is of type `str`? it might also just be the way that windows cmd displays non-latin chars, but I am not sure – Renier Aug 19 '15 at 07:01
  • which os and version ? – dsgdfg Aug 19 '15 at 07:04
  • 2
    have these fonts installed on your system? in windows terminal even they've installed there are always problems, which i've faced in case of azerbaijani, turkish and russian texts. but in my linux (ubuntu 12.04), without any configuartion, nicely prints polish, persian (arabic scripts). `>>> print "Polish letters: ęóąśłżźćń" => Polish letters: ęóąśłżźćń` >>> print "که حتا تویه این شرایطم پریودی<=" که حتا تویه این شرایطم پریودی – marmeladze Aug 19 '15 at 07:04
  • OS: Windows 7 Enterprise, SP1 – psmith Aug 19 '15 at 07:22
  • When I run this script from CygWin, I can see letters hardcoded in script properly, but those loaded from file are still invalid. I've noticed, that len(l) equals 18. So if I define variable a = 'ą' and modify loop for as follows: for l in f: for i in range(len(l)): print a == l[i] i get all false... – psmith Aug 19 '15 at 07:27
  • possible duplicate of [Python, windows console and encodings (cp 850 vs cp1252)](http://stackoverflow.com/questions/9226516/python-windows-console-and-encodings-cp-850-vs-cp1252) – wenzul Aug 19 '15 at 07:38

1 Answers1

0

TRY:

# -*- coding: utf-8 -*-
from unidecode import unidecode

text = u'ęóąśłżźćń'
result = ''

for i in text:
   try:
     result += i.encode('1252').decode('1252')
   except (UnicodeEncodeError, UnicodeDecodeError):
    result += unidecode(i)

print result

It's producing the output:

eóaslzzcn

You have to install unidecode library. Get it from library reference
I've taken the polish letters within a variable,you can try it according to your need. Hope it'll help you