4

I have a class A with multiple fields a, b, c. I have a list of objects of this class A. Now, I want to extract 3 lists, first containing the field a's values from all the objects, second list containing the field b's values and third c's values.

I found the below answers
This answer says I can use list comprehension as below

a_list=[obj.a for obj in obj_list]

This answer says to use Generator Expressions to conserve memory

a_list=(obj.a for obj in obj_list)

Now, my question is will this work when I want to extract multiple attributes? If I write the line 3 times as below

a_list=(obj.a for obj in obj_list)
b_list=(obj.b for obj in obj_list)
c_list=(obj.c for obj in obj_list)

I'll be iterating through out the list 3 times. Won't it be costly? In that case, is it better to use for loop?

for obj in obj_list:
    a_list.append(obj.a)
    b_list.append(obj.b)
    c_list.append(obj.c)

Which is faster? Which is a better approach. Is there any other better optimized way? Thanks!

Nagabhushan S N
  • 6,407
  • 8
  • 44
  • 87

1 Answers1

3

Anytime you think "is X faster then Y" you need to measure.

You can devise a way to not pass your list three times.

This "way" then might still not be faster though, due to it making the whole code more complicated and computational expensive.

One way to not go through the list of object trice is leveraging zip and map like so:

class O:
    def __init__(self,a,b,c):
        self.a=a
        self.b=b
        self.c=c
    def __str__(self):
        return f"#{self.a} {self.b} {self.c}#"
    def __repr__(self): return str(self)

obj = [O(a,a**4,1.0/a) for a in range(2,20)]

print(obj)

# use a generator to make 3-tuples of your classes attributes and decompose 
# those into zip which builds your lists
a,b,c  = map(list, zip( *((e.a,e.b,e.c) for e in obj)) )

print(a,b,c )

Objects:

[#2 16 0.5#, #3 81 0.3333333333333333#, #4 256 0.25#, #5 625 0.2#, 
 #6 1296 0.16666666666666666#, #7 2401 0.14285714285714285#, #8 4096 0.125#,
 #9 6561 0.1111111111111111#, #10 10000 0.1#, #11 14641 0.09090909090909091#, 
 #12 20736 0.08333333333333333#, #13 28561 0.07692307692307693#, 
 #14 38416 0.07142857142857142#, #15 50625 0.06666666666666667#, 
 #16 65536 0.0625#, #17 83521 0.058823529411764705#, 
 #18 104976 0.05555555555555555#, #19 130321 0.05263157894736842#]

Result:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 

[16, 81, 256, 625, 1296, 2401, 4096, 6561, 10000, 14641, 20736, 28561, 
 38416, 50625, 65536, 83521, 104976, 130321] 

[0.5, 0.3333333333333333, 0.25, 0.2, 0.16666666666666666, 0.14285714285714285, 
 0.125, 0.1111111111111111, 0.1, 0.09090909090909091, 0.08333333333333333,
 0.07692307692307693, 0.07142857142857142, 0.06666666666666667, 0.0625, 
 0.058823529411764705, 0.05555555555555555, 0.05263157894736842]

You would still have to measure if thats faster then going through a list of objects trice.

And even if it would be slower for 18 elements it might be faster for 2 millions. So it is highly circumstantional what to use.


Timings:

s = """
class O:
    def __init__(self,a,b,c):
        self.a=a
        self.b=b
        self.c=c
    def __str__(self):
        return f"#{self.a} {self.b} {self.c}#"
    def __repr__(self): return str(self)

# changed to ** 2 instead of 4
# changed to 200 elements
obj = [O(a,a**2,1.0/a) for a in range(2,200)] 
"""

code1="""
a,b,c  = map(list,zip( *((e.a,e.b,e.c) for e in obj))  )
"""
code2="""
a1 = [e.a for e in obj]
b1 = [e.b for e in obj]
c1 = [e.c for e in obj]
"""

from timeit import timeit

print(timeit(code1,setup=s,number=100000))
print(timeit(code2,setup=s,number=100000))

Result:

7.969175090000135  # map + zip
5.124133489000087  # simple loop
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69