Persisting data between script execution

Question

I was wondering if python has a simple method for caching a sequence of values, where the sequence can be updated each time the script is run. For example, let's say I have a list of tuples where each tuple is a datetime and a float. The datetime represents the time a speed was recorded by an anemometer and the float is the speed recorded. When I run my script, new values should be added to my list and remember the next time I run the script. When I first started programming, the way I solved this was using a pickle, as follows:

import os
import pickle
import datetime

db_path = "speeds.p"

# get all our previous speeds
speeds = []
if os.path.exists(db_path):
    with open(db_path, "rb") as f:
        speeds = pickle.load(f)


def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37)
    )

    for i in data:
        yield i


try:
    # add new speeds
    for t, v in data_from_endpoint():
        if len(speeds) == 0 or t > speeds[-1][0]:
            print(f"Adding {t}, {v}")
            speeds.append((t, v))
finally:
    # save all speeds
    with open(db_path, "wb") as f:
        pickle.dump(speeds, f)

print(f"Number of values: {len(speeds)}")

The way I would solve this now is to use a sqlite database. Both solutions involve a lot of code for something so simple and I'm wondering if python has a simpler way of doing this.

You use ickle, and it's 2 lines to read and 2 lines to write. I would say that is a good solution. Personally, I prefer e.g. `json`, so I can human-read the data in the file. But of course, it depends on how much data you handle. With pickle and json, you write all the data each time, and load everything into memory, while sqlite is a bit smarter. — Dr. V, Oct 22 '22 at 18:46
If you're looking to avoid the overhead of loading the whole pickle file in and writing the whole thing out, you could always try [shelve](https://docs.python.org/3/library/shelve.html), which lets you add one object at a time. — Nick ODell, Oct 22 '22 at 18:49
I went the opposite direction. Started of with enterprise databases SQL Server, Oracle etc. and I just use pickle for most things these days. I'd throw yaml in the mix:n ot quite as simple as pickle but the files are easily human readable and editable which is a positive. I'm an old dog though: no new tricks from me. — John M., Oct 22 '22 at 18:49
@JohnM., ...I disagree that YAML is human-editable. Much less so than simpler formats like TOML, at least; see https://www.arp242.net/yaml-config.html for an essay on the subject. The YAML spec is even longer than the XML spec; it's full of carve-outs, exceptions, and duplicative ways to do the same thing. (See also https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-in-yaml-over-multiple-lines/21699210#21699210) — Charles Duffy, Oct 22 '22 at 20:13
That said -- this boils down to a tool-selection question; arguably off-topic for that reason, as well as on account of being too broad and opinion-centric. — Charles Duffy, Oct 22 '22 at 20:17
@CharlesDuffy thanks that makes for some interesting reading. I don't stretch the bounds of yaml very much: I just use it for configuration files for small scale stuff where I want to be able to read and tweak if necessary. I usual save the data structures from code and tweak from there. I haven' t look into it to much and I wasn't aware of quite how bloated the spec is. For what I use it for it seems concise and legible, but I can see how you could get bogged down. That said a long long time ago I wrote a very large enterprise website using XML and XSLT; anything seems simple after that. — John M., Oct 22 '22 at 20:49
Personally, I'm a fan of XML, but... not at all a fan of XSLT. It's much easier to work with if you're putting, say, XQuery in front of it instead, or mixing templating logic inline with Genshi. — Charles Duffy, Oct 22 '22 at 21:26
(but this conversation makes the point about storage and serialization options being opinion-centric by nature). — Charles Duffy, Oct 22 '22 at 21:28

score 1 · Answer 1 · answered Oct 22 '22 at 19:26

You can append to pickle files. I don't know if that is simple enough:

import pickle
import datetime

db_path = "speeds.p"

def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37)
    )

    for i in data:
        yield i

# no need to check for the existence of the pickle file with append mode
with open("speeds.pickle", "ab") as f:
    for t, v in data_from_endpoint():
        speeds = pickle.dump({t:v}, f)

# run several times and see how the list gets longer
objs=[]
with open("speeds.pickle", "rb") as f:
    while True:
        try:
            o = pickle.load(f)
        except EOFError:
            break
        objs.append(o)

print(objs)

Baz · Answer 2 · 2022-10-22T19:50:48.793

Here is a solution using shelve as suggested by @Nick ODell. Not yet sure if I should be using str(t) below.

import shelve
import datetime

def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37),
    )

    for i in data:
        yield i


with shelve.open("speeds.db",writeback=True) as db:

    # add new speeds
    for t, v in data_from_endpoint():
        t = str(t)
        print(t)
        if t not in db:
            print(f"Adding {t}, {v}")
            db[t] = v

    print(f"Number of values: {len(db)}")

score -1 · Answer 3 · answered Oct 22 '22 at 20:22

There is no real "standard" way for Python to store data.

Consider that:

Your program needs a location for its file.
It needs write permissions in that location.
If you use the current working directory, it will fail to find the file when it is started from another location.
If your program needs to work on multiple platforms, it becomes a lot more complicated.

This is an excerpt of a program that works both on ms-windows and POSIX systems like FreeBSD and Linux.

import os
import re
import json


home = ""
uname = ""


def load_data():
    """
    Load the program's data file.

    It is located in the user's home directory,
    which is platform specific.
    
    Uses the global variable home,
    which is initialized when the program starts.
    """
    try:
        with open(home + os.sep + "resins.json") as rf:
            lines = rf.readlines()
    except (FileNotFoundError, KeyError):
        with open("resins.json") as rf:
            lines = rf.readlines()
    text = "\n".join([ln.strip() for ln in lines])
    try:
        lm = re.search("// Last modified: (.*)", text).groups()[0]
    except AttributeError:
        lm = None
    nocomments = re.sub("^//.*$", "", text, flags=re.MULTILINE)
    return loads(nocomments), lm


if __name__ == "__main__":
    # Platform specific set-up
    if os.name == "nt":
        uname = os.environ["USERNAME"]
        home = os.environ["HOMEDRIVE"] + os.environ["HOMEPATH"]
        if home.endswith(os.sep):
            home = home[:-1]

    elif os.name == "posix":
        uname = os.environ["USER"]
        home = os.environ["HOME"]

    recepies, filedate = load_data()

Persisting data between script execution

3 Answers3