concurrent, networked access to a python list

Question

I have a data structure like this:

{
    'key1':[
        [1,1,'Some text'],
        [2,0,''],
        ...
    ],
    ...
    'key99':[
        [1,1,'Some text'],
        [2,1,'More text'],
        ...
    ],
}

The size of this will be only like 100 keys and 100 lists in each key.

I like to store it and retrieve it (the entire list) based on the key. This is for a use in a web-server with not very high traffic. However, the back end must handle concurrent reads and writes.

How to do this in a safe way and without writing too much code?

I suppose storing the pickled object in SQLite is a possible solution.

Are there better ways?

You may be able to use Queues because they are thread-safe. This [related answer](http://stackoverflow.com/questions/6319207/are-lists-thread-safe) may help. — Praveen Gollakota, Apr 09 '12 at 16:11
So the processes should wait for other processes working on the same data before being able to retrieve a list? — Sven Marnach, Apr 09 '12 at 16:14
@bukzor: I think your edit changed the meaning. The OP suggested that storing pickled lists in a SQLite database. Moreover, the OP never asked for networked access. — Sven Marnach, Apr 09 '12 at 16:17
@bukzor: Ad 2) All server processes seem to be running on the same machine, so you don't need networked access. — Sven Marnach, Apr 09 '12 at 16:27
@SvenMarnach: I'd consider an http application to be networked. I might changed "networked" to "web" but I don't see an important difference. WONTFIX. If the OP disagrees, they're obviously free to fix it. — bukzor, Apr 09 '12 at 16:30
Why do you use multiple processes if the server will be low-traffic? Simply use a single server process, and you are done. — Sven Marnach, Apr 09 '12 at 16:37
@bukzor: The question is about the backend of the server processes, which most certainly won't be HTTP. — Sven Marnach, Apr 09 '12 at 16:38

score 0 · Answer 1 · answered Apr 09 '12 at 16:18

The answer is it depends on where the bottleneck is. If your process is I/O bound (i.e. it will not consume a significant amount of CPU time handling all your equests), you should look into an event-driven framework like Twisted. In this case, you can store your data in a normal dictionary, as only one thread will access it at a given time.

If your process is CPU-intensive and you want to take advantage of multiple cores, you need to work with multiple Python processes, since each Python process will hold a GIL (Global Interpreter Lock) and multiple threads cannot execute Python code in the same process at the same time. In this case, a really simple option is to use a shared storage such as memcached.

I believe (he mentions "not very high traffic") that the OP's application is very simple and low-load. The simplest solution you can think of will be most appropriate. — bukzor, Apr 09 '12 at 16:31

score 0 · Answer 2 · answered Apr 09 '12 at 16:21

0

Python pickles are not made for concurrent access. If you can create a single long-lived process, you can simply used a memory structure for operation, and use a single pickle file for persistance. You would need to make sure that you catch signals to ensure that the pickle file gets written!

A more flexible solution is sqlite. The sqlite web page brags: "We are aware of no other embedded SQL database engine that supports as much concurrency as SQLite." The problem here is that you are quite likely to do the database design wrong (no offense intended). I would create a "keys" table and a "lists" table (more meaningful names are recommended!), with a foreign-key pointing from the lists to the keys. Make sure the keys have an index!

If the lists are not fixed-length, you should create a third table to hold those values, with a foreign key pointing back to the lists.

answered Apr 09 '12 at 16:21

bukzor

37,539
11
77
111

OP want concurrent writes, could sqlite3 fulfill this? – okm Apr 09 '12 at 16:23
okm: please read. The link to the sqlite FAQ addresses your question. The answer is simply "yes". – bukzor Apr 09 '12 at 16:26
Yeah, also follow lines after that words: "When any process wants to write, it must lock the entire database file for the duration of its update.". SQLite in-memory is cool but I prefer the memory structure you mentioned for such task – okm Apr 09 '12 at 16:29
Let's not excerpt out of context. The next sentence is: "But that normally only takes a few milliseconds." The overall advice of the FAQ entry is that SQLite is suitable for all but very demanding applications. The concluding sentence is: "Experience suggests that most applications need much less concurrency than their designers imagine." – bukzor Apr 09 '12 at 16:35
I agree w/ the conclusion. Its actually depends on the OP's write TPS then – okm Apr 09 '12 at 16:49

concurrent, networked access to a python list

2 Answers2