47

Ansi to UTF-8 using python causing error

I tried the answer there to convert ansi to utf-8.

import io

with io.open(file_path_ansi, encoding='latin-1', errors='ignore') as source:
    with open(file_path_utf8, mode='w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

But I got "TypeError: 'encoding' is an invalid keyword argument for this function"

I tried with

with io.open(file_path_ansi, encoding='cp1252', errors='ignore') as source:

, too, and got same error.

Then I tried

import io

with io.open(file_path_ansi, encoding='latin-1', errors='ignore') as source:
    with io.open(file_path_utf8, mode='w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

and still got the same error. Also I tried with cp1252, too, but got the same error.

I learned from several stackoverflow questions that

TypeError: 'encoding' is an invalid keyword argument for this function

is frequently arising error message in python 2.x

But mainly answerers were suggesting using python 3 in some way or the other.

Is it really impossible to convert ansi txt to utf-8 txt in python 2.x ? (I use 2.7)

Community
  • 1
  • 1
user3123767
  • 1,115
  • 3
  • 13
  • 22
  • I doubt that you got the same error when you used `io.open()` for both calls. Please convert your code snippet to a complete program and re-run. If you still get the error, please copy-paste the entire program (it should only be 7 lines or so) and the entire error output into your question. – Robᵩ Jul 31 '14 at 03:20
  • possible duplicate of [Ansi to UTF-8 using python causing error](http://stackoverflow.com/questions/24893173/ansi-to-utf-8-using-python-causing-error) – Robᵩ Jul 31 '14 at 03:23
  • For reference: [Pragmatic Unicode, or, How do I stop the pain?](http://pyvideo.org/video/948/pragmatic-unicode-or-how-do-i-stop-the-pain) – wwii Jul 31 '14 at 03:25
  • Using Python 3 constructs in Python 2 is inevitably an error, albeit sometimes not one with an explicit error message. In the worst case, your code runs, but does the wrong thing. You need to understand the differences between Python 2 and 3 and settle on one or the other. (Going forward, Py3 is the recommended choice.) – tripleee Jul 31 '14 at 03:45

4 Answers4

73

For Python2.7, Use io.open() in both locations.

import io
import shutil

with io.open('/etc/passwd', encoding='latin-1', errors='ignore') as source:
    with io.open('/tmp/goof', mode='w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

The above program runs without errors on my PC.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Thank you! Perhaps I made some mistake when I tried io. Your code didn't cause any error message. But once I open the output in notepad++, non-English characters are all broken. I checked with both latin-1 and cp1252. – user3123767 Jul 31 '14 at 05:17
  • I transfered https://drive.google.com/file/d/0B1sEqo7wNB1-Mk5KZFM2SmxtbTA/edit?usp=sharing to https://drive.google.com/file/d/0B1sEqo7wNB1-RzE3VTc0SFhGR1U/edit?usp=sharing As you can see, the non-English has been broken. Is this because it's python 2.7 ? – user3123767 Jul 31 '14 at 05:46
  • 1
    For anyone still supporting older versions, this encoding-enabled variant of `open` (along with the rest of the `io` module) was [introduced in Python 2.6](https://docs.python.org/2.6/library/io.html#io.open). For 2/3 code compatibility, [it's still in Python 3](https://docs.python.org/3/library/io.html#io.open), where it's just an alias to the built-in `open` function. – Kevin J. Chase May 11 '15 at 03:15
12

This is how you can convert ansi to utf-8 in Python 2 (you just use normal file objects):

with open(file_path_ansi, "r") as source:
    with open(file_path_utf8, "w") as target:
        target.write(source.read().decode("latin1").encode("utf8"))
Thomas Hobohm
  • 645
  • 4
  • 9
  • Thank you.. but this code has the same problem as the one above. It breaks non-English characters. I converted drive.google.com/file/d/0B1sEqo7wNB1-Mk5KZFM2SmxtbTA/… to drive.google.com/file/d/0B1sEqo7wNB1-RzE3VTc0SFhGR1U/… As you can see, the non-English has been broken. Is this because it's python 2.7 ? – user3123767 Aug 01 '14 at 15:04
  • 1
    It's because `latin-1` doesn't support non-English characters. – Thomas Hobohm Aug 01 '14 at 17:09
  • I tested for `cp1252` not `latin1` and I am getting same error. When I open the ansi txt file using notepad++, then I correctly see those non-English characters. So I thought Python would be able to read these non-English, too, even if it is ansi. But there's no way? – user3123767 Aug 02 '14 at 06:15
  • Well if there are non-english characters, then your file isn't ansii (or latin1). – Thomas Hobohm Aug 02 '14 at 15:03
  • Hmm.. But when I go to "encoding" in notepad++, then I see that "ANSI" is checked. – user3123767 Aug 03 '14 at 03:30
  • See this at https://drive.google.com/file/d/0B0E6P9zEQ6w0MmF6QkhwN2RmY1U/edit?usp=sharing notepad++ is saying it is ANSI, and it is displaying non-English. Perhaps notepad++ has some special ability to display non-English? – user3123767 Aug 03 '14 at 03:34
  • Yes, notepad++ has that ability, as do many text editors. In general, though, ANSI (or latin-1) doesn't include non-english characters. – Thomas Hobohm Aug 03 '14 at 03:50
  • Hmm.. if converting Ansi to utf8 without breaking non-English is impossible for python, do you know any other language that can do this? – user3123767 Aug 03 '14 at 03:57
  • Not really, because ANSI doesn't support non-english characters, so if you're saving your file as ANSI with non-english characters in it, then you're kind of screwed. – Thomas Hobohm Aug 03 '14 at 03:58
8

TypeError: 'encoding' is an invalid keyword argument for this function

open('textfile.txt', encoding='utf-16')

Use io, it will work in both 2.7 and 3.6 python version

import io
io.open('textfile.txt', encoding='utf-16')
1

I had the same issue when I did try to write bytes to file. So my point is, bytes are already encoded. So when you use encoding keyword this leads to an error.

Taras Vaskiv
  • 2,215
  • 1
  • 18
  • 17