Is there a good reason why classes shouldn't include a list of all objects created?

Question

Learning lot about python. For one of my programs, I need to compare all objects that have been created, so I put them in a list. I thought it would maybe be simpler if I created a class variable that includes every object created.

This seems so obvious to me that I wonder why it isn't done all the time, so I figure there must be a really really good reason for that.

So for something like

class Basket:
     baskets = []
     def __init__:(self, id, volume):
          self.id = id
          self.volume = id
          baskets.append(self)

Why is this not done more often? It seems obvious. My assumption is that there are very good reasons why you wouldn't, so I'm curious to hear them.

I've never needed to keep a list of all objects created, only the ones I've knowingly created. — chepner, Oct 24 '21 at 13:35
One reason is that if you do that there's no way to get them _out_ of the list again, so no instance you create can ever be garbage collected (you could avoid this with a [weakref](https://docs.python.org/3/library/weakref.html)). — jonrsharpe, Oct 24 '21 at 13:37
If some of your code wants a list of `Basket` objects then it should make its own - and it has no business including `Basket` objects outside of its own purview. Also, this is mutable global state - so it's a bad idea also for the same reasons any mutable global state is bad. — kaya3, Oct 24 '21 at 13:41
see https://stackoverflow.com/questions/12101958/how-to-keep-track-of-class-instances — Rad, Oct 24 '21 at 13:41

user2357112 · Answer 1 · 2022-07-15T02:36:41.407

This is one of those ideas new programmers come up with over and over again, that turns out to be unuseful and counterproductive in practice. It's important to be able to manage the objects you create, but a class-managed single list of every instance of that class ever turns out to do a very bad job of that.

The core problem is that the "every" in "every object created" is much too broad. Code that actually needs to operate on every single instance of a specific class is extremely rare. Much more commonly, code needs to operate on every instance that particular code creates, or every member of a particular group of objects.

Using a single list of all instances makes your code inflexible. It's a design that encourages writing code to operate on "all the instances" instead of "all the instances the code cares about". When the scope of a program expands, that design makes it really hard to create instances the code doesn't or shouldn't care about.

Plus, a list of every instance is a data structure with almost no structure. It does nothing to express the relationships between objects. If two objects are in such a list, that just says "both these objects exist". You quickly end up needing more complex data structures to represent useful information about your objects, and once you have those, the class-managed list doesn't do anything useful.

For example, you've got a Basket class. We don't have enough information to tell whether this is a shopping basket, or a bin-packing problem, or what. "Volume" suggests maybe it's a bin-packing problem, so let's go with that. We've got a number of items to pack into a number of baskets, and the solver has to know about "all the baskets" to figure out how to pack items into baskets... except, it really needs to know about all the baskets in this problem. Not every instance of Basket in the entire program.

What if you want to solve two bin-packing problems, with two separate sets of baskets? Throwing all the baskets into a single list makes it hard to keep track of things. What if you want to solve two bin-packing problems at the same time, maybe in two different threads? Then you can't even just clear the list when you're done with one problem before moving on to the next.

What if you want to write unit tests? Those will need to create Basket instances. If you have a class-managed list of all baskets, the tests will add Basket instances to that list, making the tests interfere with each other. The contents of the list when one test runs will depend on test execution order. That's not good. Unit tests are supposed to be independent of each other.

Consider the built-in classes. int, dict, str, classes like those. Have you ever wanted a list of every int in your entire program, or every string? It wouldn't be very useful. It'd include all sorts of stuff you don't care about, and stuff you didn't even know existed. Random constants from modules you've never heard of, os.name, the Python copyright string, etc. You wouldn't have the slightest clue where most of it even came from. How would you get anything useful done with a list like that?

On a smaller scale, the same thing applies to a list of every instance of a class you write. Sure, your class won't be used in quite as many situations as a class like int, but as the scope of a program expands, your class will end up used in more ways, and those uses probably won't need to know about each other. A single list of instances intrinsically makes it hard for different uses of a class to avoid interfering with each other.

Is there a good reason why classes shouldn't include a list of all objects created?

1 Answers1