Efficient way of dealing with MongoDB objects in Python

Question

I'm trying to build a very simple MongoDB extraction script but I stumbled upon this problem of most keynames and strings being recorded as unicode. So, when I try to just print a document it comes out with a bunch of keys and values like this u'username':u'christian' etc.

And my documents in this database are rather large and complex, so I have several nested levels.

I searched around a bit to make this conversion from unicode keys and values to ASCII but came up with nothing.

I'm trying to convert all keys and values like so

def convert2ascii:
    for k, v in mydict.iteritems():
        newk = k.encode('ascii','ignore')
        mydict[newk] = mydict.pop(k)
        if isinstance(v, unicode):
            mydict[newk] = v.encode('ascii','ignore')
        elif isinstance(v, dict):
            convert2ascii(v)
        #elif isinstance(v, list): // todo
        #    convert2ascii(v)

But I keep running into some uncovered scenario (like a list of dictionaries, or just a plain list of strings) and having to add all of those to to that function, and by now it is rather ugly.

Any ideas on how I can make that simpler?

Why do you think Unicode keys are a problem? They are not, and you should definitely not encode them to anything. — Daniel Roseman, Apr 25 '18 at 20:48
I simply want to get rid of the u'' everywhere. I cannot deliver the exported file like this to another departament as it will cause problems for them to import the file. — Christian Dechery, Apr 25 '18 at 20:58
But the output from Mongodb is not a file. If you want a file, then use a file serialisation format like JSON. That will avoid any need to encode anything. — Daniel Roseman, Apr 25 '18 at 21:04
Yeah, but HOW? This is exactly what I want. To output it simply as JSON. But even that I couldn't easily do because of the unicode format of the documents. — Christian Dechery, Apr 25 '18 at 23:10
Use the built in json library: `json.dumps(my_data)`. No need to convert any Unicode. — Daniel Roseman, Apr 26 '18 at 05:46

innicoder · Answer 1 · 2018-04-25T20:54:32.263

Easy answer: Pickle It is very quick and an easy way to serialize and save any object.

import pickle

# pickle highest performance object conversion into a bytestream 
# used to pickle n unpickle (i.e. save/load)
def save(file, filename='file'):
    with open(f'{filename}.pickle', 'wb') as f:
        pickle.dump(file, f, protocol=pickle.HIGHEST_PROTOCOL)


def load(filename):
    with open(f'{filename}.pickle', 'rb') as f:
        return pickle.load(f)

Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions.

Pickle isn’t human readable

Pickle isn’t language-agnostic

Pickle is slow

Downsides Addressed of Pickle: HERE

Efficient way of dealing with MongoDB objects in Python

1 Answers1