0

I'm looking to use class objects to store data from multiple simulations and save them in a list, each entry in the list a new instance of the class). I'd like to ensure that each instance has exactly the same attributes (variables), but would like to assign them dynamically as the program runs - not all at the same time when the new instance is created.

Some suggestions to do that (what I'd use a structure for in C) are here, but if using a native class (preferred option over external modules or dictionary for simplicity/readability), I don't see a way to make sure that each instance has exactly the same attributes yet they are populated at different times. Please tell me if there's a solution for this.

Don't want to use the constructor below because the attributes would all have to be populated at once - then I guess I'd have to create the object at the end of the simulation but would have to definite additional variables to store the data in the meantime. Also this is not great because I have a large number of attributes and the argument in the brackets would be very long.

simulations = []

class output_data_class:

    def __init__(self, variable1, variable2):
        self.variable1 = variable1
        self.variable2 = variable2

# simulation, generates the data to store in variable1, variable2

variable1 = 'something' # want to avoid these interim variables
variable2 = 123 # want to avoid these interim variables

new_simulation = output_data_class(variable1, variable2) # create an object for the current simulation

simulations.append(new_simulation) # store the current simulation in the list

Right now I'm using something like this:

simulations = []

class output_data_class:

    pass

new_simulation = output_data_class() # create an object for the current simulation

# simulation, generates the data to store in variable1

new_simulation.variable1 = 'something'

# another part of simulation, generates the data to store in variable2

new_simulation.variable2 = 123 

simulations.append(new_simulation) # store data from current simulation

This allows me to add the data as they are produced during the simulation (can be things like an array of data, then something calculated from that array etc.) My gut feeling is that the above is bad practice - it isn't immediately clear what the instance attributes are supposed to be, and it doesn't protect against creating totally new attributes due to typos etc (example below). I'd like to impose that each instance must have the same attributes.

new_simulation.variabel2 = 123

Note the typo above - this would create a new variable specifically for this instance, right?

I would like to be able to declare the admissible attributes (including their type if possible) in the class definition, but obviously not as class variables as I need to populate them separately for each instance. (And to reiterate, not in the innit method because then I believe I'd have to populate all the attribute at once.)

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Henry
  • 3
  • 2
  • You could work with a custom `__setattr__` that checks if the provided key is in a list of valid keys `("variable1", "variable2")` and raises an error otherwise. – Guimoute Mar 16 '23 at 18:53
  • Welcome to Stack Overflow. "I don't see a way to make sure that each instance has exactly the same variables yet they are allocated at different times." It's hard to understand the question, because the word *variables* doesn't make sense in this context. I can think of at least two things you could mean, neither of which is using the word correctly. (Also, we don't really "allocate" objects; allocation is done for the *memory used to store* objects, but Python takes care of that automatically.) – Karl Knechtel Mar 16 '23 at 18:58
  • It seems like what you mean is that the instances should have the same *set of attributes*, but you don't want to decide the attribute names when writing the class. Is that right? In that case - when *do* you figure out the attribute names? **What actually is a "simulation"** in the context of your program? – Karl Knechtel Mar 16 '23 at 19:00
  • 1
    "Some suggestions to do that (what I'd use a structure for in C) are here, but if using a native class (preferred option over external modules or dictionary for code simplicity/readability)" - I can't understand either the distinction you are trying to draw, or your reasoning. Things like `dataclass`, `namedtuple` etc. create perfectly ordinary, "native" Python classes, they just hide a bunch of boilerplate code from you. The modules you need in order to create them come from the Python standard library, and they are perfectly simple and readable. – Karl Knechtel Mar 16 '23 at 19:03
  • 1
    Hi @KarlKnechtel, thank you for the reply! My "simulation" is just a program that generates some data, each instance would be storing the data from a single run (a physics simulation with some new parameters). I definitely want to fix the name of the attributes when defining the class if possible, they should be the same for all the instances. – Henry Mar 16 '23 at 19:04
  • Also thanks for the pointers, I'll try and replace the confusing terms in the question. – Henry Mar 16 '23 at 19:06
  • "Don't want to use the below because the variables would all have to be populated at once" - well, no, they don't. Python does not have real data hiding and even its approximations are not the default. In the first code example, there is nothing preventing you from writing `new_simulation.variable1 = 'something else'` later. It also isn't necessary to have "interim variables" to call a constructor, just like it isn't in order to call **anything else** - `output_data_class('something', 123)` works fine. – Karl Knechtel Mar 16 '23 at 19:08
  • @KarlKnechtel but then if I want to create a new instance before all the data is generated, I'd have to make up some values for the attributes in the call only to replace them later? like `new_simulation = output_data_class('something random', 345)` and later do `new_simulation.variable1 = 'the correct thing'`? Also how do I ensure that `variable1` is indeed an allowed attribute of the instance? (As opposed to creating a new attribute if there is a typo or something? – Henry Mar 16 '23 at 19:17
  • It's not possible to come up with a coherent Q&A here the way things are going. Please read [ask] and note well that this is **not a discussion forum**. If you are generally just trying to figure out a *design for a project*, that is better suited to, for example, https://reddit.com/r/learnpython. – Karl Knechtel Mar 16 '23 at 19:20
  • Thanks Karl, I'll have a read through that and see if there's any way I can clarify the original question. But I think it's a specific enough question to belong here. – Henry Mar 16 '23 at 19:26

2 Answers2

2

One way to do this is to use a dataclass with slots=True (added in Python 3.10) to restrict the instance attributes, and use None as a default value. Although, it's not clear if this is a good design based on the question; hopefully, if nothing else, this is a good jumping-off point.

from dataclasses import dataclass

@dataclass(slots=True)
class OutputData:  # BTW, the convention is UpperCamelCase for class names
    variable1: str = None
    variable2: int = None

Used like so:

new_simulation = OutputData()

print(new_simulation)
# -> OutputData(variable1=None, variable2=None)

new_simulation.variable1 = 'something'
new_simulation.variable2 = 123

print(new_simulation)
# -> OutputData(variable1='something', variable2=123)

Trying to assign to arbitrary attributes raises an error:

new_simulation.variabel2 = 123
# -> AttributeError: 'OutputData' object has no attribute 'variabel2'

More info:

wjandrea
  • 28,235
  • 9
  • 60
  • 81
0

I would like to be able to declare the admissible attributes (including their type if possible) in the class definition, but obviously not as class variables as I need to populate them separately for each instance. (And to reiterate, not in the __init__ method because then I believe I'd have to populate all the attribute at once.)

It sounds like you could use dataclasses.

Python objects support an attribute called __slots__. The actual purpose of __slots__ is to save memory, but they can also be used to forbid access to variables not specifically declared by the class. More information.

So you could implement your idea like this:

from dataclasses import dataclass

@dataclass(slots=True, init=False)
class Foo:
    bar: int
    baz: str


f = Foo()
f.bar = 10
f.baz = 'a string'
f.qux = 'another string'  # This causes an error because it's not in __slots__

Explanations:

  • The slots=True argument to the dataclass decorator causes the dataclass to generate a __slots__ attribute.
  • By default, dataclasses include an __init__ method which requires you to specify values for every member of that dataclass. But with init=False, you can turn this behavior off, and the dataclass starts uninitialized.
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • 1
    With `init=False`, trying to print the class before all its attributes are assigned raises `AttributeError: 'Foo' object has no attribute 'bar'.` So you might want to add `repr=False` as well. I took a different approach in [my answer](/a/75760977/4518341) and put a default value. – wjandrea Mar 16 '23 at 19:51