0

I have the following data structure:

One sample contains 5 vectors. In all the vectors there are elements from the same classes but the classes are different between the vectors. These vectors are really big with thousands of elements. I usually have several (5-10) samples.

At the moment I have a vector for every sample what contains the vectors of the classes. And I store the vectors of the samples in a vector so I can manage all the samples at once.

I use vector cause while filling my dataset I use .append(). But later on I won't change the data just iterate through and analyze it.

My problem is with memory. Now the dataset eats a lot of it. So some optimization would be great.

That's why I ask if there is a better way to store this dataset?

I've heard that array is better if I don't change my data. Is it worth maybe to convert everything to array after loaded them as vector? What do you recommend?

For example, I show a dataset below similar to mine:

class van:
    #some data
    pass;
class bus:
    #some more data
    pass;
class motorcycle:
    #something else
    pass;

all_data = []
for i in range(7):
    vans = [van() for i in range(5000)]
    buses = [bus() for i in range(2000)]
    mcycles = [motorcycle() for i in range(3000)]
    dataset = [vans, buses, mcycles]
    all_data.append(dataset)
srikavineehari
  • 2,502
  • 1
  • 11
  • 21

2 Answers2

0

if you want to keep your current code intact (minimizing work), you may consider replacing lists with lazylist. lazylist@github

internety
  • 364
  • 3
  • 8
0

Considering that you need to keep the class structure, you can have a drastic improvement of the memory consumption just by using __slots__. When a new object will be created, only attribute defined in this list will be allowed. But this is more efficient. Checkout this question.

Another approach would be to use structured array from numpy. But this depends on the exact nature of your data.

tupui
  • 5,738
  • 3
  • 31
  • 52