0

I have experienced a code problem in Python 2.7, I already used UTF-8, but it still got the exception

"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 81: ordinal not in range(128)"

My files and contains so many this kind of shit, but for some reason, I'm not allowed to delete it.

desktop,[Search] Store | Automated Titles,google / cpc,Titles > Kesäkaverit,275285048,13

I have tried the below method to avoid, but still, haven't fix it. Can anyone help me ?

1.With "#!/usr/bin/python" in my file header

2.Set setdefaultencoding

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

3.content = unicode(s3core.download_file_to_memory(S3_PROFILE, S3_RAW + file), "utf-8", "ignore")

My code below

    content = unicode(s3core.download_file_to_memory(S3_PROFILE, S3_RAW + file), "utf8", "ignore")
    rows = content.split('\n')[1:]
    for row in rows:
        if not row:
            continue

        try:
            # fetch variables
            cols = row.rstrip('\n').split(',')
            transaction = cols[0]
            device_category = cols[1]
            campaign = cols[2]
            source = cols[3].split('/')[0].strip()
            medium = cols[3].split('/')[1].strip()
            ad_group = cols[4]
            transactions = cols[5]

            data_list.append('\t'.join(
                ['-'.join([dt[:4], dt[4:6], dt[6:]]), country, transaction, device_category, campaign, source,
                 medium, ad_group, transactions]))

        except:
            print 'ignoring row: ' + row
user2953788
  • 157
  • 3
  • 17
  • Post the code where you read data. – Mufeed Jul 06 '18 at 12:51
  • Possible duplicate of [Unicode (UTF-8) reading and writing to files in Python](https://stackoverflow.com/questions/491921/unicode-utf-8-reading-and-writing-to-files-in-python) - the top voted answer is best – FHTMitchell Jul 06 '18 at 12:53
  • Thanks, the code has attached. – user2953788 Jul 06 '18 at 12:56
  • Check [this link](https://stackoverflow.com/a/26605823/6104077) also – Sushant Jul 06 '18 at 13:02
  • Learn the difference between byte strings and Unicode: [unipain](https://nedbatchelder.com/text/unipain.html) and [What every program should now about Unicode](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/). Switch to Python 3 where implicit conversion between the two is not supported. – Mark Tolonen Jul 07 '18 at 06:03
  • Your #1 only matters if you use non-ASCII string literals in the source code. No effect. Your #2 is asking for trouble (see [setdefault encoding will break code](https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/)). #3 I can't judge because it isn't an [mcve]. – Mark Tolonen Jul 07 '18 at 06:09

0 Answers0