4

Some background of the data: some different games are being played, and each of them hosts a number of players. Each game consists of a number of rounds, and during each round each player involved makes an action. What I am trying to do here is to construct a data structure in memory for storing the complete history of individual actions taken by the players across all the games being played.

An obvious structure is a deeply nested dictionary/hashmap, where each game_id is mapped to a number of player_ids, and each player_id is mapped to different round_numbers, and each round_number is mapped to an action.

In other words, game_id:player_id:round_number:action. On the other hand, I can also use game_id:round_number:player_id:action

Problem arises when I try to access the data structures above for different analytical purposes. For example, it's inconvenient to have game_id:player_id:round_number:action, if I want to know all the actions made by players in a particular round of a given game. Conversely, it is equally inconvenient to have game_id:round_number:player_id:action, if I want to know all the actions made by a particular player during the course of a given game. Unfortunately, in my case, I need to ask both those questions frequently.

I wonder if there is a single data structure which can store such data and is convenient for accessing both player-level and round-level data as described above. The implementation will be in Python, if that matters.

EDIT: a few people have recommended in-memory sqlite database to handle such relational queries. However, its performance may be an issue for me, as discussed here: SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?

Community
  • 1
  • 1
skyork
  • 7,113
  • 18
  • 63
  • 103
  • 1
    Sounds like you want a relational database which lives in memory. Maybe try looking into inMemory SQLite. http://www.sqlite.org/inmemorydb.html – Justin Jun 01 '12 at 22:32
  • Do you only care about looking up via the two examples you give? Or do you want to do arbitrary lookups? – Doug T. Jun 01 '12 at 22:36
  • @DougT. Besides those two cases, I do also want to look up the specific action made by any given player at any given round of any given game. – skyork Jun 01 '12 at 22:47
  • @skyork really sounds ideal for a relational db... – Doug T. Jun 01 '12 at 23:24
  • I'm not sure why you're worried about the performance of SQLite with relation to the link you post. From the answers to that question, it's clear that the 1.5X timing isn't because in-memory is slow, but because the disk-database is being cached in memory and so is almost the same as the in-memory database. – Epcylon Jun 03 '12 at 13:24
  • @Epcylon, could you point me to the actual performance stats of in-memory SQLite? Can it be as fast as accessing entry in a nested dictionary in memory as above? – skyork Jun 04 '12 at 00:13
  • I don't have any actual performance stats to point you to, but it all sounds like premature optimization to me, especially since you're using Python to begin with. Have you tested it and found that using an in-memory SQLite database is too slow? Because the use case you present sounds like a typical relational database is the best solution, and I'd be careful to throw away a good solution without knowing for sure that it won't perform. – Epcylon Jun 04 '12 at 17:45

3 Answers3

3

One way would be to store the data in a dict, but maintain indexes to allow quick access to various views into your data. You could structure this with a class, or just functions. Here is the jist of it (untested):

from collections import defaultdict

game_dict = {}  # keyed by (game, player, round) tuple
game_player_ix = defaultdict(list)
game_round_ix = defaultdict(list)

def add_action(game, player, round):
    game_dict[(game, round, player)] = action # track the action in the main dict
    game_player_ix[(game, player)].append(round)  # keep an index for lookups by player
    game_round_ix[(game, round)].append(player) # another index for lookups by round

def get_all_player_actions(game, player):
    return (game_dict[(game,player,round)] for round in game_round_ix[(game, player)]) # iterator

def get_all_round_actions(game, round):
    return (game_dict[(game,player,round)] for player in game_player_ix[(game, round)]) # iterator
Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • One of the good points in this answer is also that it uses tuples as dictionary keys, which seems to be a good way to store data which is labelled with several IDs as in your case. – silvado Jun 02 '12 at 11:32
1

I would recommend either

  1. Writing a class that has functions that wrap common access pattern to your nested maps.
  2. Using an sqlite3 database.

EDIT:

I misread the question, sorry.

I can't think of a single data structure that would do this, though it wouldn't be too bad to replicate data slightly. Make player a class, and you could have rounds store a map of players to actions, and also have the player class contain a list of actions taken by that player.

cabbagebot
  • 386
  • 1
  • 12
  • thanks for the suggestions. But how does (1) solve the problem here? If it is the same underlying data structure, the wrapper functions still face the same problem as mentioned. (2) is not optimal here as I'd want to avoid any I/O. – skyork Jun 01 '12 at 22:11
  • I've edited my answer. I should point out that if there is a finite list of possible actions, having the data replicated as such has low overhead, as the player will only store references to the actions. – cabbagebot Jun 01 '12 at 22:32
  • 1
    @skyork: you can create a SQLite db entirely in RAM with `sqlite3.connect(":memory:")`. – Fred Foo Jun 01 '12 at 23:05
0

You could store a set of tuples where each tuple stores just plain (game_id, player_id, round_number, action). You might also just use interned strings of the player name instead of an id. If you don't know what analysis you are to do then this format leaves each field equally accessible for statistical analysis and is simple to convert to storage in a database if you feel the need in the future.

A named tuple might also be used.

Paddy3118
  • 4,704
  • 27
  • 38