1

I've parsed a big corpus and I've saved the data I needed in a dictionary structure. But at the end of my code I've saved it as a .txt file 'cause I needed to manually check something. now in another part of my work I need that dictionary as my input. I wanted to know if there are other ways than just opening the text file and re-putting it as a dictionary structure. If I can just manipulate my other to keep also as it is. Is Pickle the right thing for my case? or I'm totally on a wrong way? sorry if my question is so naive ,I'm really new to python and I'm still learning it.

Pari
  • 39
  • 2
  • 7
  • pickle is the way to go – gkusner Jul 31 '14 at 12:44
  • 1
    `json` may be more appropriate if human readability is important. You can't "manually check something" by opening a pickle file in Notepad. – Kevin Jul 31 '14 at 12:45
  • @json no no I did it , I mean I've parsed the corpus and now I have my dictionary as text file, once I have it, it's enough. I can keep it, and now maybe I can re-write my other code to parse one other time and keep the dictionary as it is for other parts of my work . I don't know if I'm clear or not , I mean the manually check stuff was just once and I've got what I wanted. – Pari Jul 31 '14 at 12:48
  • I just don't want to read another time as input a text file and convert it to dictionary . 'casue it seems complicated to me . – Pari Jul 31 '14 at 12:51
  • What do you mean by "now I have my dictionary as text file"? How did you save the dictionary? –  Jul 31 '14 at 12:55
  • @Tichodroma as a .txt file . now I want manipulate my code and make it save the dictionary as it is . – Pari Jul 31 '14 at 13:00
  • There is no standard way to save a dictionary as a text file in Python. So *how* did you do it? –  Jul 31 '14 at 13:06

1 Answers1

3

Copy & pasting from Pickle or json? for the ease of reading.

If you do not have any interoperability requirements (i.e. you're just going to use the data with Python), and a binary format is fine, go with cPickle, which gives you really fast Python object serialization.

If you want interoperability, or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).

According to the above, I guess you would like cPickle over json

However, another article I found that is interesting: http://kovshenin.com/2010/pickle-vs-json-which-is-faster/, which proves that json is a lot faster than pickle (the author states in the article that cPickle is faster than pickle but stil slower than json)

This SO answer What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary? compares 6 different libraries.

  • pickle
  • cPickle
  • json
  • simplejson
  • usjon
  • yajl

In addition, if you use pypy, json can be really fast.

Finally, some very recently profiling data https://gist.github.com/schlamar/3134391.

Community
  • 1
  • 1
pochen
  • 873
  • 12
  • 22
  • Thanks for your answer .Actually I don't mind if it's fast or not, I'm trying to find a more clear way , which is easier for a beginner to work with. now I'm going to review what you have suggested. – Pari Jul 31 '14 at 13:05
  • 2
    Wish you the best of luck @Pari ! P.S. I would just go with json since it's generally available, human readable and quite performant. – pochen Jul 31 '14 at 13:07