6

I'm at my wits end on this one. I need to write some Chinese characters to a text file. The following method works however the newlines get stripped so the resulting file is just one super long string.

I tried inserting every known unicode line break that I know of and nothing. Any help is greatly appreciated. Here is snippet:

import codecs   
file_object = codecs.open( 'textfile.txt', "w", "utf-8" )
xmlRaw = (data to be written to text file )    
newxml = xmlRaw.split('\n')
for n in newxml:
    file_object.write(n+(u'2424'))# where \u2424 is unicode line break    
Chris Hall
  • 871
  • 6
  • 13
  • 21
  • `u'\n'` is Unicode line break. – user2357112 Aug 09 '13 at 22:48
  • What do you mean by "converted to UTF-8"? What form is it initially? (If it's ASCII text, it's already UTF-8.) – user2357112 Aug 09 '13 at 22:49
  • its data parsed from an XML file that includes chinese characters. I received character encoding errors when writing to a txt file however it outputted to the console fine. Using this method I can save the Chinese characters, however newlines are dropped – Chris Hall Aug 09 '13 at 22:52
  • \u2424 is not an actual newline, it's the "symbol for newline"; in text it will usually be rendered as an "n" and an "l" next to each other, but it will not actually break lines. – timotimo Aug 10 '13 at 00:10
  • In python, unicode is actually a standalone type, and one can cast str into unicode, see: http://docs.python.org/2/howto/unicode.html#the-unicode-type – Patrick the Cat Aug 10 '13 at 00:47

3 Answers3

4

If you use python 2, then use u"\n" to append newline, and encode internal unicode format to utf when you write it to file: file_object.write((n+u"\n").encode("utf")) Ensure n is of type unicode inside your loop.

rtxndr
  • 872
  • 1
  • 9
  • 20
0

I had the same problem to the same effect (wit's end and all). In my case, it was not an encoding issue, but the need to replace every '\n' with '\r\n', which led to better understanding the difference between line-feeds and carriage returns, and the fact that Windows editors often require \r\n for line breaks: 12747722

marc_a
  • 21
  • 3
0

The easiest way to do that is using the combination of "\r\n" as marc_a said.

So, your code should look like this:

import codecs   
file_object = codecs.open( 'textfile.txt', "w", "utf-8" )
xmlRaw = (data to be written to text file )    
newxml = xmlRaw.split('\n')
for n in newxml:
    file_object.write(n+u"\r\n")
Ferd
  • 1,273
  • 14
  • 16