2

I have a structure like this (pseudo code):

class Player {
    steamid: str
    hero: Hero
}
class Hero {
    class_id: str
    level: int
    xp: int
    skills: list[Skill]
}
class Skill {
    class_id: str
    level: int
}

Now I'm trying to store it into a database, and I gave my player a get_serialized_data() method which returns a tuple like so:

return (
    # players
    (steamid, hero.class_id),
    # heroes
    (steamid, hero.class_id, hero.level, hero.xp),
    # skills
    (
        (steamid, hero.class_id, skill.class_id, skill.level)
        for skill in hero.skills
    ),
)

Finally, I'm simultaneously storing every players' data into the database, and using three calls to executemany() to save:

  1. Every player's data in one executemany()
  2. Every hero's data in one executemany()
  3. Every skill's data in one executemany()

And here's my code to do that:

def save_all_data(*, commit=True):
    """Save every active player's data into the database."""
    players_data = []
    heroes_data = []
    skills_data = []
    for player in _players.values():
        player_data, hero_data, skills_data_ = player.get_serialized_data()
        players_data.append(player_data)
        heroes_data.append(heroes_data)
        skills_data.extend(skills_data_)
    _database.save_players(players_data)
    _database.save_heroes(heroes_data)
    _database.save_skills(skills_data)
    if commit:
        _database.commit()

The "problem", as you can see, is that I construct three large lists. Is it possible to replace these lists with generators somehow? My _database.save_X() methods all accept generators, so it would save a lot of RAM.

Edit: Also, I don't want to loop through the players three times. So I'd love to get three generators somehow during one loop.

Markus Meskanen
  • 19,939
  • 18
  • 80
  • 119
  • *“three large lists”* – How large are we actually talking about? `player_data` and `hero_data` appear to be 2- and 4-tuples respectively. And if you’re using Python 3, then `skills_data_` should already be a generator of 4-tuples (since you are using a generator expression `(something for skill in hero.skills)`) – poke Jul 19 '16 at 20:59
  • Number of players can be anything from one to 5000, number of heroes is always equal to number of players, and each hero can have anything from 4 to 15 skills. And I'm talking about the `players_data`, `heroes_data`, and `skills_data` lists, hoping to turn them into generators of `(tuple, tuple, generator[tuple])` instead of lists of `(tuple, tuple, generator[tuple])`. – Markus Meskanen Jul 19 '16 at 21:00
  • It’s relevant because a list with 15 items is *nothing* (you gain nothing from a generator there). And even 5000 is not *that* much. You also have to consider where the data comes from. Is the data already in memory? Or are you reading it lazily from somewhere else? – poke Jul 19 '16 at 21:04
  • As your question stands, it’s not clear where the data is coming from and what exactly you want to turn into a generator. `player.get_serialized_data()` returns a three tuple with not much data, so that doesn’t make much sense, so I feel like you want to turn `_players.values()` into a generator but you don’t show at all what `_players` is. So I don’t know how we are supposed to help you here. – poke Jul 19 '16 at 21:07
  • 5000 players isn't that much, but remember that there's an equal amount of heroes and each hero has 4-15 skills. That's ~100,000 items. The data *does* already exist so it's already in RAM, but I'd still prefer to avoid the lists if possible. As I said, I want to turn the current `players_data`, `heroes_data` and `skills_data` into generators. They're currently lists. – Markus Meskanen Jul 19 '16 at 21:08
  • @poke Not `player_data` and `hero_data`, but `players_data` and `heroes_data`. And `skills_data` instead of `skills_data_`. – Markus Meskanen Jul 19 '16 at 21:29
  • Oh, sorry about that, now I get it! – poke Jul 19 '16 at 21:33
  • You're a lot more likely to have cost in `executemany` since it runs the sql statement for every single item individually instead of bulk inserting. You should write the code one way first, profile the performance, change it, and then see if performance improves. – Daenyth Jul 19 '16 at 22:34
  • Possible duplicate of [How to split a Python generator of tuples into 2 separate generators?](http://stackoverflow.com/questions/28030095/how-to-split-a-python-generator-of-tuples-into-2-separate-generators) – arekolek Jul 19 '16 at 23:27

1 Answers1

2

There's no way to avoid storing O(len(players)) worth of data if you want to save the sets of your player, hero and skill data in separate operations on the database (rather than doing one operation for each player with their associated hero and skill data, or saving it all somehow in parallel).

Generators won't help you here. Even if you could come up with a generator that returned the hero and skill data, it would need to maintain a list (or some other data structure) in the background unless your three database saves were happening in parallel. You might want to compare what you're asking for to the implementation of itertools.tee, which creates several "copies" of an input iterator. It's only space efficient if you're iterating over the copies in parallel (with for instance, zip), rather than one by one. If you're iterating over the copies one by one, it's essentially the same as copying the iterator's contents into a list and iterating over that repeatedly.

Blckknght
  • 100,903
  • 11
  • 120
  • 169