-1

I saved a default dictionary (defaultdict) to a file using Python, (so it's a string now) because I thought it'd be more convenient but now it's stuck as a string and ast.literal_eval(my_string) is not working unless I slice the "default dict" wording out. How do I retrieve my dictionary from this file in a more elegant way than slicing the default dict notation out and using ast.literal_eval? Thank you!

JTFouquier
  • 389
  • 1
  • 4
  • 17
  • 4
    Not sure if it's too late, but using `pickle` for saving and loading would make your life easy. – Julien Dec 04 '15 at 00:46
  • If you trust the file (that is, if no one else has the ability to change it), you could just use Python's `eval` (this is [dangerous](http://stackoverflow.com/questions/3513292/python-make-eval-safe) if, for example, the file is uploaded by a user). – David Robinson Dec 04 '15 at 00:47
  • 1
    Haha. Yes, I just learned about the pickle feature, but I saved a bunch of files as I explained and I feel like there has to be a nice way to fix it. – JTFouquier Dec 04 '15 at 00:47
  • How did you save it? (What code). – TessellatingHeckler Dec 04 '15 at 00:48
  • I just wrote it to a file as a string file.write(str(my_dict)). I usually save things in tab delimited files so I can read them easily and open in Excel, but I thought saving as a dict would save me time later. – JTFouquier Dec 04 '15 at 00:49
  • 1
    @jennifer06262016 Try `my_dict = eval(file.read())`. Again, don't do this if the files are uploaded by someone else (or they could insert arbitrary code), but it sounds like they were just saved by you. – David Robinson Dec 04 '15 at 00:51
  • 1
    @DavidRobinson: Actually that doesn't work, `eval()` only *simply run the code*. So if you type `defaultdict(, {})` for example in Python Shell, it'll only raise a `SyntaxError` and will not create a new `defaultdict` . So `pickle` is the best choice. – Remi Guan Dec 04 '15 at 00:54
  • The [`json` library](https://docs.python.org/2/library/json.html) is much better than `pickle`, in my opinion. – pzp Dec 04 '15 at 02:14
  • Just so I can learn from here, what's the point of minus points for this posted question? There is obviously not a straightforward answer, and I think that doing something incorrectly and trying to solve it is worthy of a question. Obviously I realize going forward there is a better approach. Thoughts? Thx! – JTFouquier Dec 04 '15 at 18:33

1 Answers1

2

As Kevin Guan notes in the comments, the repr of a defaultdict is not the same as the code you would use to initialize a fresh one from scratch (because it prints the repr of the default constructor, so a defaultdict(list) with no entries would stringify as defaultdict(<class 'list'>, {}), and <class 'list'>, isn't Python legal syntax).

There is no nicer way to handle this other than string manipulation; there is no generalized way to unrepr something when the repr can't be eval-ed. If there is only one defaultdict per file, you could read back in the bad data and write out a good pickle with something like:

import ast, collections, pickle

for filename in allfiles:
    # Slurp the file
    with open(filename) as f:
        data = f.read()

    # Slice from first { to last } and convert to dict (assumes all
    # keys/values are Python literals too)
    data = ast.literal_eval(data[data.index('{'):data.rindex('}')+1])

    # Convert to defaultdict
    # Assumes they're all defaultdict of list, and that you need them to
    # remain defaultdict; change list to whatever you actually want
    newdefdict = collections.defaultdict(list, ast.literal_eval(data))

    # Rewrite the input file as a pickle containing the recovered data
    # Use open(filename + ".pickle", ...
    # if you want to avoid rewriting without verifying that you got the right data
    with open(filename, 'wb') as f:
        pickle.dump(newdefdict, f, protocol=pickle.HIGHEST_PROTOCOL)

The pickle.HIGHEST_PROTOCOL bit uses the most up to date protocol (if you need to interoperate with older versions of Python, choose the highest protocol that works on all versions; for Py2 and Py3 compatibility, that's protocol 2).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271