0

I'm trying to process a Korean text file with python, but it fails when I try to encode the file with utf-8.

#!/usr/bin/python
#-*- coding: utf-8 -*-


f = open('tag.txt', 'r', encoding='utf=8')
s = f.readlines()

z = open('tagresult.txt', 'w')
y = z.write(s)
z.close
=============================================================
Traceback (most recent call last):
  File "C:\Users\******\Desktop\tagging.py", line 5, in <module>
    f = open('tag.txt', 'r', encoding='utf=8')
TypeError: 'encoding' is an invalid keyword argument for this function
[Finished in 0.1s]

==================================================================

And when I just opens a Korean txt file encoded with utf-8, the fonts are broken like this. What can I do?

\xc1\xc1\xbe\xc6\xc1\xf6\xb4\xc2\n', '\xc1\xc1\xbe\xc6\xc7\xcf\xb0\xc5\xb5\xe7\xbf\xe4\n', '\xc1\xc1\xbe\xc6\xc7\xcf\xbd\xc3\xb4\xc2\n', '\xc1\xcb\xbc\xdb\xc7\xd1\xb5\xa5\xbf\xe4\n', '\xc1\xd6\xb1\xb8\xbf\xe4\

cuongnv23
  • 382
  • 2
  • 7
  • Can you tell us if it is Python 2 or 3? – WombatPM Aug 08 '16 at 00:03
  • I use python 2 and it doesn't work if I correct the typo. –  Aug 08 '16 at 05:53
  • Possible duplicate of [Backporting Python 3 open(encoding="utf-8") to Python 2](http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2) – tripleee Aug 08 '16 at 09:01

2 Answers2

0

In Python 2 the open function does not take an encoding parameter. Instead you read a line and convert it to unicode. This article on kitchen (as in kitchen sink) modules provides details and some lightweight utilities to work with unicode in python 2.x.

WombatPM
  • 2,561
  • 2
  • 22
  • 22
0

I don't know Korean, and don't have sample string to try, but here are some advices for you:

1

f = open('tag.txt', 'r', encoding='utf=8')

You have a typo here, utf-8 not utf=8, this explains for the exception you got.

The default mode of open() is 'r' so you don't have to define it again.

2 Don't just use open, you should use context manager statement to manage the opening/closing file descriptor, like this:

with open('tagresult.txt', 'w') as f:
    f.write(s)
cuongnv23
  • 382
  • 2
  • 7