14

I've made a class in which I want to keep track of stats of students. I intend to make a GUI later to manipulate this data.

My main question is: what is the best way to save and later retrieve this data?

I've read about pickle and JSON, but I don't really get how they work (especially about how they save the data, like in which format and where).

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
Lennart Wijers
  • 225
  • 1
  • 2
  • 11
  • 2
    Any reason not to use sqlite, which is shipped with Python? – XORcist Jan 24 '13 at 19:42
  • Storing and retrieving of data is called "serialization", I took the liberty to add that to the tags. Using a websearch (or the Powerz of the Lazyweb as below), you will find lots of info on that topic. – Ulrich Eckhardt Jan 24 '13 at 20:31

5 Answers5

20

If your data are pretty simple, like just collections of collections of strings or numbers, I would use json. What JSON is, is a string representation of simple data types and combinations of simple data types. Once you use the json module to convert your data to a string, you write it to a file yourself.

It's super simple:

>>> my_data = [range(5) for i in range(5)]
>>> my_data
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
>>> import json
>>> json.dumps(my_data)
'[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]'

Then just write that string to a file. When you want to reload it, like so:

>>> import json
>>> string_from_file
'[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]'
>>> my_saved_data = json.loads(string_from_file)
>>> my_saved_data
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

If your data are more complicated, and involves classes other than the built-in collection objects, pickle is a better choice. One very important thing to know about pickle is that there are security vulnerabilities in pickle, and it's a bad idea to unpickle anything you yourself didn't pickle. pickle is vulnerable to the security problems detailed in this article: http://www.kalzumeus.com/2013/01/31/what-the-rails-security-issue-means-for-your-startup/

If the size of your data is very large, or you will be saving/loading it frequently, or for any reason using json and saving to a local file is inadequate, then a database will be the way to go.

Brenden Brown
  • 3,125
  • 1
  • 14
  • 15
  • I think JSON is a good choice. In terms of complexity, it's somewhele between CSV and XML. I consider CSV a bit too simple for many uses, it's hard to represent "natural" structures in it. The upside of XML is that everything and the kitchensink can import/export it or use it as protocol base, so it's a powerful metaformat, but it's also more complicated. – Ulrich Eckhardt Jan 24 '13 at 20:35
  • 2
    Nice thing about JSON is that, unlike Pickle, it is language-agnostic. – JohnJ Jan 24 '13 at 21:36
  • 1
    This answer is the one I would recommend to anyone here. Also, if you have numpy arrays and such, json-tricks is a really good package. – eric Mar 26 '19 at 14:11
  • 1
    Maybe also mention that pickles can be incompatible even between different Python versions. – tripleee Jan 06 '20 at 10:52
11

For persistent data (storing info about students), a database is a good choice. As already mentioned, Python comes shipped with Sqlite3 which is often good enough, for development purposes at least.

Introducing Sqlite to Python is easy - just import the library in your source code file and open a connection to your database. Refer to the python documentation.

EDIT: Found a new tutorial about Python + Sqlite that seems good.

Sami N
  • 1,170
  • 2
  • 9
  • 21
2

You also may use csv files module. It depends on what you need.

Arthur Julião
  • 849
  • 1
  • 14
  • 29
  • 3
    The problem with CSV is that it is unsuitable for structured data. If you can get by with saving just a single two-dimensional array (or a handful of arrays, if you use a separate CSV file for each), CSV is fine; but it's unsuitable for anything with more dimensions or structure within individual fields. – tripleee Jan 06 '20 at 10:54
2

Use a database. SQLAlchemy with SQLight is a good start. You'll end up there in the end anyway.

or

Dump everything out with the pickle module. (there really isn't anything to understand, you save objects and then load them again, it's really simple).

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
1

You could use pickling, Python's serialization mechanism:

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • Pickles are fine for short-term local storage, but don't scale well to large data amounts, network concurrency, and/or different Python versions. – tripleee Jan 06 '20 at 11:01