2

I have a toy Python project that involves simulating the movement of objects across a grid. My constraints are:

  • The objects are my own arbitrarily defined class(es)
  • The objects can change position on the grid
  • Objects can be added or removed
  • Objects can have their attributes updated
  • I must iterate through all such objects
  • Object's size might be nontrivial (relative to my machine's power)
  • Objects must be able to "see" space taken up on the grid (that is, the grid must be accessible by all objects)

What's the most efficient datatype/container in Python for me to store such objects?


My current thoughts:

  • Numpy array of type object - This seems the best case as I can then reference the objects by their position, but my understanding is numpy isn't really intended for this use case and may not be particularly efficient. Setting it up this way isn't particularly intuitive either.
  • List for the objects and numpy array for shared location updating - This allows a clean iteration through the objects and I could update the location both within class attributes and on the numpy array, but then it's not easy to reference an object specifically by location
  • Dictionary for the objects and numpy array for shared location updating - This would allow me to reference objects by location (keys) and update the master array so knowledge of those changes are shared, but then retaining objects having them move location is odd (they'd need to be copied to new keys I think)

I feel like I may be missing something simple here unless a numpy grid really is the best option.

Josh
  • 167
  • 1
  • 13
  • 1
    Numpy stands for *numeric* Python. It does handle very efficiently numeric operations on arrays, because it processes them directly at C language level. But the gain is much smaller if any, when processing arbitrary Python objects because a switch to a Python context is required for each and every iteration. – Serge Ballesta Aug 14 '22 at 18:59
  • The main problem lies in "*objects are my own arbitrarily defined class*". CPython objects are slow for computing. They are slow because of Python feature like reference-counting, dynamic type checking, user-defined type inference, etc. There is not much you can do about that except using tools like Cython (or Numba but with an experimental support) that basically bypass CPython. This also means using static typing, and not some feature of Python for sake of performance. If you want to stick with the (slow) CPython *interpreter*, then lists are the best option (but still slow). – Jérôme Richard Aug 14 '22 at 19:31
  • 1
    List of lists is the only practical solution in Python (or for some structures `dict`). `numpy` arrays can store `objects`, but functionally they are similar to lists. You won't gain any thing compared to lists. – hpaulj Aug 14 '22 at 19:39
  • Compromising the simplicity of just handling your class objects, maybe your objects could carry an index that addresses the performance-critical numerical data in `numpy` arrays, such as position, size, speed. When you add and remove objects, these arrays need of course to be updated. – Dr. V Aug 14 '22 at 19:43
  • @Dr.V They will carry their position either way. Now I can iterate through all objects to pick the ones having that attribute at some position to run an update, but that seems not great relative to accessing them directly by location initially. – Josh Aug 14 '22 at 20:12
  • @hpaulj List of lists makes sense if numpy doesn't offer benefits. I assume the idea to maintain the grid shape would be to replace objects rather than pop/remove them out. If so, would that indeed remove them from memory or does that just remove the pointer? – Josh Aug 14 '22 at 20:13
  • Lists contain pointers/references to objects. An object can be referenced many ways - by a variable, any number of elements of lists, values of dicts, etc. The fast and compact storage of `numpy` only works for numbers, things that can stored in simple `c` arrays. – hpaulj Aug 14 '22 at 20:22
  • 1
    I think unless there's more to be added if someone notes how numpy is not having any advantage over list of lists (and that likely being the best option for python) in an answer I'm happy to accept it. – Josh Aug 14 '22 at 21:21
  • Does this answer your question? [What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?](https://stackoverflow.com/questions/26767694/what-are-the-benefits-drawbacks-of-a-list-of-lists-compared-to-a-numpy-array-o) – Vladimir Fokow Aug 15 '22 at 06:18
  • 1
    @VladimirFokow It adds clarity on the numpy array vs list of lists, but I think it would be useful for possible future readers later for someone to add an answer here that can be accepted. For instance, I think the other part of the answer is that Python doesn't really offer much better than these two options for the task described in the post. – Josh Aug 15 '22 at 13:10

1 Answers1

0

Summarizing the comments above since no one is posting an answer: Generally speaking, since numpy arrays are designed for numerical computation, there is little benefit in using them for object types compared to a list of lists. This link, however, goes into the nuances of that in more detail. Python then really doesn't have a better option than a list of lists, but this would still accomplish the goal and allow referencing by position. The grid shape could be maintained by initializing it with dummy placeholders and then updating what exists at each position in the list as opposed to outright removing items.

Josh
  • 167
  • 1
  • 13