0

I read the answer to this question and still am getting the error AttributeError: 'dict' object has no attribute 'encode'.

I've tried

dic = pickle.load(fileObject)
for v in dic:
    v.encode('ascii', 'ignore')

and

dic = pickle.load(fileObject)
for key, val in dic.iteritems():
    val.encode('ascii', 'ignore')

and still get the same error. When printing out the variables they all display with a u at the front. The dictionary was pickled under python 3 and is being unpickled in python 2.

I tried pp.pprint((dataFromPrevMod).encode('ascii', 'ignore')) and it didn't work.

If I pprint out the dictionary, it shows its contents but in Python 3 each line starting with a u for example u'website': u'exmample.org'

Dictionary pretty printed in Python 3

{
        'output': {
                'table': 'intersection',
                'file_location': '\\\\storage1\\tpn\\tpn_team\\dev\\asmithe\\',
                'schema': 'asmithe',
                'temporary_location': '\\\\storage1\\tpn\\tpn_team\\dev\\asmithe\
\'
        },
        'tpn_inventory_db_r': {
                'generic_pwd': '51f3tlNE26',
                'db_name': 'tpn',
                'user': 'asmithe',
                'schema': 'asmithe',
                'host': 'example.tpns.org'
        },
        'proj_year': '2005',
        'proj_rules_r': 'C:\\asmithe\\rules.txt',
        'incidents_db_r': {
                'schema': 'tpn_pp_dist',
                'generic_pwd': '51f3tlNE26',
                'db_name': 'tpn',
                'user': 'asmithe',
                'fire_table': 'incident',
                'host': 'example.tpns.org'
        },
        'plots_to_project_r': 'C:\\Users\\asmithe\\Plots.txt',
        'tpn_proj_db_r': {
                'generic_pwd': '51f3tlNE26',
                'db_name': 'tpn inventory',
                'user': 'asmithe',
                'schema': 'test',
                'host': 'example.tpns.org'
        }
}

Dictionary pretty printed in Python 2 (notice the addition of u)

{   u'incidents_db_r': {   u'db_name': u'tpn',
                                u'fire_table': u'incident',
                                u'generic_pwd': u'51f3tlNE26',
                                u'host': u'example.tpns.org',
                                u'schema': u'tpn_pp_dist',
                                u'user': u'asmithe'},
    u'tpn_inventory_db_r': {   u'db_name': u'tpn',
                                 u'generic_pwd': u'51f3tlNE26',
                                 u'host': u'example.tpns.org',
                                 u'schema': u'asmithe',
                                 u'user': u'asmithe'},
    u'tpn_proj_db_r': {   u'db_name': u'tpn inventory',
                            u'generic_pwd': u'51f3tlNE26',
                            u'host': u'example.tpns.org',
                            u'schema': u'test',
                            u'user': u'asmithe'},
    u'output': {   u'file_location': u'\\\\storage1\\tpn\\tpn_team\\dev\\asmithe\ \',
                   u'schema': u'asmithe',
                   u'table': u'intersection',
                   u'temporary_location': u'\\\\storage1\\tpn\\tpn_team\\dev\\asmithe idek\\'},
    u'plots_to_project_r': u'C:\\Users\\asmithe\\Plots.txt',
    u'proj_rules_r': u'C:\\asmithe\\rules.txt',
    u'proj_year': u'2005'}
Community
  • 1
  • 1
Celeritas
  • 14,489
  • 36
  • 113
  • 194
  • 2
    Note: The first for loop iterates over the keys, `v` might not be the best variable name. What are `type(key)` and `type(val)`? Currently looks like there's a nested dictionary in there. – dhke Aug 20 '15 at 19:09
  • Also, can you show what `print(val)` ? – Anand S Kumar Aug 20 '15 at 19:09
  • Are you sure valuee are not already unicode strings? Cause it is the default in python3. But the first problem is that `data` is not what you think it is... Do a nice `import pprint` `pprint.pprint(data)` – Maresh Aug 20 '15 at 19:12
  • @Maresh no no, they ARE in unicode but when I do pprint they have extra u's added to them. That's what I'm trying to do, get rid of the u's. – Celeritas Aug 20 '15 at 19:13
  • 1
    Also, remember that Python strings are immutable. Calling something like `val.encode(whatever)` creates and returns a *new string* to you, leaving the original unmodified. In this code, you ignore that new string value, which probably isn't want you want. – bgporter Aug 20 '15 at 19:13
  • @Maresh - as noted in the tag, the OP is using Python 2. – TigerhawkT3 Aug 20 '15 at 19:13
  • @TigerhawkT3 He mentionned data has been pickled from Python3 that's why I ask. @Celeritas, the pprint is just to show us what `data` is exactly, because from the error you report it sounds like it's actually nested dicts. – Maresh Aug 20 '15 at 19:15
  • @Celeritas, we'd like to see the output of `pprint` or `print` or `type` :) See my previous comment. – Maresh Aug 20 '15 at 19:18
  • *Where* are the `u''`s showing up? They certainly appear in `repr(u'äöü')`, but `print(u'äöü')` should properly decode unicode to the terminals proper encoding. – dhke Aug 20 '15 at 19:18
  • @Maresh - exactly. It's pickled in Python 3, so the OP can indeed be sure that they are Unicode strings. However, Unicode strings in Python 2 have a `u` prefix, so the OP wants to make them _not_ Unicode. – TigerhawkT3 Aug 20 '15 at 19:18
  • Yes. I figured he wanted from bytes to str, my bad. @Celeritas full output pls, `pprint(data)`? From your error message `data` sounds like it is: `{ 'one': {...} }` so you're trying to call `.encode` on a dict, not a string, that's what we want to verify. – Maresh Aug 20 '15 at 19:24
  • @Maresh ok added full output. I'm really not trying to do anything complicated, just use the variables in a dictionary. BTW, I think you're right about the nested thing. So how do I iterate through a nested dictionary and convert everything to ascii? Do I need to convert everything to ascii? I just want to use the values to connect to the database. – Celeritas Aug 20 '15 at 19:38
  • Thx. So that's what I though, you have nested dicts :) You need to do that recursively and test what the type of the value is. Check my answer – Maresh Aug 20 '15 at 19:51
  • @Celeritas So, did any of the answers below help you? If you problem is solved mark it as so please :) – Maresh Aug 21 '15 at 19:27
  • @Maresh no not fully. There's too many commands I don't understand such as `isinstance` and `self` – Celeritas Aug 21 '15 at 22:47
  • Isinstance just check wether the value is a dict or a unicode string, you can just run the code I provided it will do the trick, cause I don't think you need the keys as bytes – Maresh Aug 22 '15 at 10:02

3 Answers3

0

The reason the pprint shows unicode strings as u'bla' is that in Python 2, string and unicode objects are both sequence types, but they are not the same.

So it is logical that pprint in Python 2 doesn't show them as such.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Does it matter in the sense I just want to use the values to connect to the database with? – Celeritas Aug 20 '15 at 19:39
  • @Celeritas - Have you tried connecting to the database with these values yet? – TigerhawkT3 Aug 20 '15 at 20:00
  • @Celeritas It depends on the format of the data in the database. If you encode the string as UTF-8 and the database uses UTF-16, it's probably not going to match. But you'd have to encode the unicode text in some way. – Roland Smith Aug 20 '15 at 21:18
0

You have nested dicts, so you need something like this (done very quickly just to give you an idea):

def unicode_to_bytes(d):
    for k in d:
        if isinstance(d[k], dict):
            unicode_to_bytes(d[k])
        elif isinstance(d[k], unicode):
            d[k] = d[k].encode('ascii', 'ignore')

test = {
    'a': u'b',
    'b': {
        'c': u'c',
        'd': {'e': u'f'}
    }
}

unicode_to_bytes(test)

print test

This is not taking care of keys though.

Hope it helps.

Maresh
  • 4,644
  • 25
  • 30
  • How about `d[str(k)] = str(d[k]); del d[k]`? – TigerhawkT3 Aug 20 '15 at 19:53
  • 1
    You'd get: *RuntimeError: dictionary changed size during iteration* with the `del` And also: *UnicodeEncodeError: 'ascii' codec can't encode character* if you don't specify you want to ignore non-ascii. – Maresh Aug 20 '15 at 19:58
  • Yes, I guess there's many ways to approach this thing, let's leave him figure out the rest on his own :-P – Maresh Aug 20 '15 at 20:01
0

For the sake of completeness: It's astoundingly complex to write a custom pickler that automatically encodes unicode objects to strings. The below is for Python 2 only:

import pickle
import sys
from StringIO import StringIO

class EncodingUnpickler(pickle.Unpickler):
    def __init__(self, f, encoding=None):
        pickle.Unpickler.__init__(self, f)
        # we don't want to modify the class variable from pickle.Unpickler
        self.dispatch = self.dispatch.copy()
        self.dispatch[pickle.UNICODE] = EncodingUnpickler.load_unicode
        self.dispatch[pickle.BINUNICODE] = EncodingUnpickler.load_binunicode
        self.encoding = encoding or sys.getdefaultencoding()

    def load_binunicode(self):
        pickle.Unpickler.load_binunicode(self)
        self.append(self.stack.pop().encode(self.encoding))

    def load_unicode(self):
        pickle.Unpickler.load_unicode(self)
        self.append(self.stack.pop().encode(self.encoding))


d = { u'1': u'a', u'2': u'b' }
s = pickle.dumps(d)
unp = EncodingUnpickler(StringIO(s))
du = unp.load()
print du
dhke
  • 15,008
  • 2
  • 39
  • 56