0

I am having issues writing data to a file due to £ signs being in my string/list.

For example in my code below, x is created by a series of appends from various regex searchs, matches, subs and generic trims/splits.

# -*- coding: utf-8 -*-
x = [u'Loc ', u'352', '1', '51', '3D2', u'Student Total \xa3540.00', u'Discount \xa235.00', '\n', u'Rec ', u'352', '2', '51', '5S1', u'Student Total \xa3540.00', u'Discount \xa235.00', '\n']
with open('test.txt','w') as dfile:
    dfile.write('\n'.join(x)) # UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 71: ordinal not in range(128)
    dfile.write(x) # TypeError: expected a character buffer object

I am trying to write x to file so it appears like:

Loc
352
1
51
3D2
Student Total £3540.00
Discount £235.00

Rec
352
2
51
5S1
Student Total £3540.00
Discount £235.00

Anyone know how I can do what I am trying to achieve?

EDIT

I now can't get it to compare and if it's different then save...

with open('test.txt','r') as dfile:
    dfiler = dfile.read()
    dfiler = dfiler.decode("UTF-8")
    if dfiler == x:
        print "same, no need to save"
    else:            
        with open('test.txt','w') as result_end_datafile:
            dfile.write('\n'.join(x).encode("UTF-8"))
tshepang
  • 12,111
  • 21
  • 91
  • 136
Ryflex
  • 5,559
  • 25
  • 79
  • 148

1 Answers1

3

You need to encode the unicode string before writing:

dfile.write('\n'.join(x).encode("UTF-8"))

Alternatively, you use use codecs.open() in Python 2.x, thus passing the encoding as the argument while opening file:

import codecs

with codecs.open('test.txt', 'w', encoding="UTF-8") as dfile:
    dfile.write('\n'.join(x))

Related:

Community
  • 1
  • 1
Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • Works good, however the `£` on the discount one comes out as: ¢35.00 as it's picking up the 2 as part of the ascii... – Ryflex Sep 27 '13 at 20:25
  • 1
    I started recommending `io.open()` instead of `codecs.open()` as that'll continue to work the same on Python 3. The API is much the same to `codecs.open()` but offers more options, including newline handling. – Martijn Pieters Sep 27 '13 at 20:33
  • @MartijnPieters. Oh! Will take a look at it. Didn't knew about that. – Rohit Jain Sep 27 '13 at 20:41
  • @Hyflex - you've mistyped your input strings in the `x` list. `\xa3` is the british currency symbol £, `\xa2` is the the cents symbol ¢. – Robᵩ Sep 27 '13 at 20:57
  • Thank you ever so much guys, I've got another problem now again because of these. I'll edit the main post. – Ryflex Sep 27 '13 at 21:15