Pros and cons of an object with big dataframe vs a list of lots of class objects in Python

Question

I'm writing a python program to perform certain processing on data for couple of millions (N) users. Outcome of the process would be several 1D arrays for each user (the process would apply on each user separately). I need set and get functions for each user output data.

I have two options for implementing this: 1. Create a class with attributes of size N by column_size, so one object that contains big arrays 2. Create a class for each user and store instances of this class in a list, so a list of N objects

My question is that what are pros and cons of each approach in terms of speed and memory consumption?

Not sure, if this is a task for a database instead of a python program? What kind of data are you processing? Do you need all of them at once in the RAM? — Semo, Jul 03 '19 at 11:07
sry... If you are after speed, have a look at classic C/C++ program, where the data fits into a cache line, if possible. What exactly do you want to achieve? Also it's not pythonic to use getters and setters. See this SO entry: https://stackoverflow.com/questions/2627002/whats-the-pythonic-way-to-use-getters-and-setters — Semo, Jul 03 '19 at 11:17

score 0 · Answer 1 · answered Jul 03 '19 at 11:27

The question is rather broad, so I will not go beyond generalities.

If you intend to process user by user, then it makes sense to have one object per user.

On the other hand, if you mainly process all users at the same time and attribute by attribute, then it makes sense to have classes for attributes each object containing the attribute for all users. That way, if memory become scarce, you can save everything to disk and only have one user (resp. attribute) in memory.

Pros and cons of an object with big dataframe vs a list of lots of class objects in Python

1 Answers1