0

I have this text.ucs file which I am trying to decode using python.

file = open('text.ucs', 'r')
content = file.read()
print content

My result is

\xf\xe\x002\22

I tried doing decoding with utf-16, utf-8

content.decode('utf-16')

and getting error

Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position 32-33: illegal encoding

Please let me know if I am missing anything or my approach is wrong

Edit: Screenshot has been asked enter image description here

cyborg
  • 870
  • 1
  • 15
  • 34

4 Answers4

1

The string is encoded as UTF16-BE (Big Endian), this works:

content.decode("utf-16-be")
filmor
  • 30,840
  • 6
  • 50
  • 48
  • @JacquesGaudin unfortunately both are not working but as per python docs, i see '-' and not '_' – cyborg May 07 '18 at 10:54
  • 2
    @cyborg I executed this on the bytes that you provided just now, worked fine. The names with dashes and underscores are equivalent, first paragraph of https://docs.python.org/3/library/codecs.html#standard-encodings – filmor May 07 '18 at 10:55
  • >>> content.decode("utf_16_be") Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\encodings\utf_16_be.py", line 16, in decode return codecs.utf_16_be_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode byte 0x5c in position 64: truncat ed data – cyborg May 07 '18 at 10:58
1

oooh, as i understand you using python 2.x.x but encoding parameter was added only in python 3.x.x as I know, i am doesn't master of python 2.x.x but you can search in google about io.open for example try:

file = io.open('text.usc', 'r',encoding='utf-8')
content = file.read()
print content

but chek do you need import io module or not

0

You can specify which encoding to use with the encoding argument:

with open('text.ucs', 'r', encoding='utf-16') as f:
    text = f.read()
MrLeeh
  • 5,321
  • 6
  • 33
  • 51
  • the error message I got: Traceback (most recent call last): File "", line 1, in TypeError: 'encoding' is an invalid keyword argument for this function – cyborg May 07 '18 at 10:48
  • maybe you forgotten that if you usin `with open()` you need to set name for this for example : `with open('text.ucs', 'r', encoding='utf-16') as file` – NEStenerus nester May 07 '18 at 10:55
  • I did it, I am aware of with usage :) – cyborg May 07 '18 at 10:56
  • really i don't know why it's doesn't work, i tested it 20 secs ago and this construction haven't any errors my full code is `with open(source, 'r', encoding='utf-8') as csvfile:`... – NEStenerus nester May 07 '18 at 11:03
  • I have added screenshot in question, please check – cyborg May 07 '18 at 11:14
0

your string need to Be Uncoded With The Coding utf-8 you can do What I Did Now for decode your string

f = open('text.usc', 'r',encoding='utf-8')
print f
Skiller Dz
  • 897
  • 10
  • 17