I need to make a list of objects out of a numpy array (or a pandas dataframe). Each row holds all the attribute values for the object (see example).
import numpy as np
class Dog:
def __init__(self, weight, height, width, girth):
self.weight = weight
self.height = height
self.width = width
self.girth = girth
dogs = np.array([[5, 100, 50, 80], [4, 80, 30, 70], [7, 120, 60, 90], [2, 50, 30, 50]])
# list comprehension with idexes
dog_list = [Dog(dogs[i][0], dogs[i][1], dogs[i][2], dogs[i][3]) for i in range(len(dogs))]
My real data is of course much bigger (up to a million rows with 5 columns), so iterating line by line and looking up the correct index takes ages. Is there a way to vectorize this or generally make it more efficient/faster? I tried finding ways myself, but I couldn't find anything translatable, at least at my level of expertise.
It's extremely important that the order of rows is preserved though, so if that doesn't work out, I suppose I'll have to live with the slow operation.
Cheers!
EDIT - regarding question about np.vectorize:
This is part of my actual code along with some actual data:
import numpy as np
class Particle:
TrackID = 0
def __init__(self, uniq_ident, intensity, sigma, chi2, past_nn_ident, past_distance, aligned_x, aligned_y, NeNA):
self.uniq_ident = uniq_ident
self.intensity = intensity
self.sigma = sigma
self.chi2 = chi2
self.past_nn_ident = past_nn_ident
self.past_distance = past_distance
self.aligned_y = aligned_y
self.aligned_x = aligned_x
self.NeNA = NeNA
self.new_track_length = 1
self.quality_pass = True
self.re_seeder(self.NeNA)
def re_seeder(self, NeNA):
if np.isnan(self.past_nn_ident):
self.newseed = True
self.new_track_id = Particle.TrackID
print(self.new_track_id)
Particle.TrackID += 1
else:
self.newseed = False
self.new_track_id = None
data = np.array([[0.00000000e+00, 2.98863746e+03, 2.11794100e+02, 1.02241467e+04, np.NaN,np.NaN, 9.00081968e+02, 2.52456745e+04, 1.50000000e+01],
[1.00000000e+00, 2.80583577e+03, 4.66145720e+02, 6.05642671e+03, np.NaN, np.NaN, 8.27249728e+02, 2.26365501e+04, 1.50000000e+01],
[2.00000000e+00, 5.28702810e+02, 3.30889610e+02, 5.10632793e+03, np.NaN, np.NaN, 6.03337243e+03, 6.52702811e+04, 1.50000000e+01],
[3.00000000e+00, 3.56128350e+02, 1.38663730e+02, 3.37923885e+03, np.NaN, np.NaN, 6.43263261e+03, 6.14788766e+04, 1.50000000e+01],
[4.00000000e+00, 9.10148200e+01, 8.30057400e+01, 4.31205993e+03, np.NaN, np.NaN, 7.63955009e+03, 6.08925862e+04, 1.50000000e+01]])
Particle.TrackID = 0
particles = np.vectorize(Particle)(*data.transpose())
l = [p.new_track_id for p in particles]
The curious thing about this is that the print statement inside the ree_seeder function "print(self.new_track_id)", it prints 0, 1, 2, 3, 4, 5.
If I then take the particle objects and make a list out of their new_track_id attributes "l = [p.new_track_id for p in particles]" the values are 1, 2, 3, 4, 5.
So somewhere, somehow the first object is either lost, re-written or something else I don't understand.