0

I'm trying to build a very simple MongoDB extraction script but I stumbled upon this problem of most keynames and strings being recorded as unicode. So, when I try to just print a document it comes out with a bunch of keys and values like this u'username':u'christian' etc.

And my documents in this database are rather large and complex, so I have several nested levels.

I searched around a bit to make this conversion from unicode keys and values to ASCII but came up with nothing.

I'm trying to convert all keys and values like so

def convert2ascii:
    for k, v in mydict.iteritems():
        newk = k.encode('ascii','ignore')
        mydict[newk] = mydict.pop(k)
        if isinstance(v, unicode):
            mydict[newk] = v.encode('ascii','ignore')
        elif isinstance(v, dict):
            convert2ascii(v)
        #elif isinstance(v, list): // todo
        #    convert2ascii(v)

But I keep running into some uncovered scenario (like a list of dictionaries, or just a plain list of strings) and having to add all of those to to that function, and by now it is rather ugly.

Any ideas on how I can make that simpler?

Christian Dechery
  • 876
  • 10
  • 31
  • Why do you think Unicode keys are a problem? They are not, and you should definitely not encode them to anything. – Daniel Roseman Apr 25 '18 at 20:48
  • I simply want to get rid of the u'' everywhere. I cannot deliver the exported file like this to another departament as it will cause problems for them to import the file. – Christian Dechery Apr 25 '18 at 20:58
  • But the output from Mongodb is not a file. If you want a file, then use a file serialisation format like JSON. That will avoid any need to encode anything. – Daniel Roseman Apr 25 '18 at 21:04
  • Yeah, but HOW? This is exactly what I want. To output it simply as JSON. But even that I couldn't easily do because of the unicode format of the documents. – Christian Dechery Apr 25 '18 at 23:10
  • Use the built in json library: `json.dumps(my_data)`. No need to convert any Unicode. – Daniel Roseman Apr 26 '18 at 05:46

1 Answers1

-1

Easy answer: Pickle It is very quick and an easy way to serialize and save any object.

import pickle

# pickle highest performance object conversion into a bytestream 
# used to pickle n unpickle (i.e. save/load)
def save(file, filename='file'):
    with open(f'{filename}.pickle', 'wb') as f:
        pickle.dump(file, f, protocol=pickle.HIGHEST_PROTOCOL)


def load(filename):
    with open(f'{filename}.pickle', 'rb') as f:
        return pickle.load(f) 

Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions.

Pickle isn’t human readable

Pickle isn’t language-agnostic

Pickle is slow

Downsides Addressed of Pickle: HERE

innicoder
  • 2,612
  • 3
  • 14
  • 29