-1

I have 3 classes: Group, Customer, Product. Each group contains a list of Customer [C1, C2, C3,...] and each Customer contains a list of products he wants to buy [P1,P2,P3,...]. At the top level i want to do various aggreggation, for example how much is worth the total order. I end up having for nested loops that are very slow as the number of groups, customers and products increase

total_order = 0
for customer in group:
    for product in customer:
        total_order += product.price * product.amount

What would you recommend in terms of structuring the code to make it much faster?

pam
  • 676
  • 1
  • 7
  • 27

2 Answers2

1

How big are these groups etc? I'd be amazed if these operations were a bottle neck. Are you going out to a db in your loops?

You can use list comprehensions instead, but I'm not sure it gains you anything.

total_order = sum(p.price * p.amount for c in group for p in c)

Aidan Kane
  • 3,856
  • 2
  • 25
  • 28
  • any group can contain 100 Customer with up to 200 products – pam Jan 15 '20 at 09:28
  • @pam I'm fairly confident that you're doing something else in your loops that's causing the issue - if I simulate your data (100 * 200) I can complete the operation in 1.82ms. Maybe you're using an ORM that's loading data from the DB quietly? Are you using a DB at all? – Aidan Kane Jan 15 '20 at 10:42
-1

In python, you have to avoid for loops as much as you can, because of many reasons. Even in machine learning (Neural Networks) terminology, it is better to use Vectoriziation instead of explicit for loops. In your case i suggest to use "map" as it is actually for loop written in C.

Shahryar
  • 324
  • 1
  • 3
  • 17
  • I've never heard of this. Can you point to a resource for this? – Aidan Kane Jan 15 '20 at 08:41
  • i gave you the link in "map", click on it, you will be redirected to python original page. and for vectorization it is the link below: – Shahryar Jan 15 '20 at 16:46
  • https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html – Shahryar Jan 15 '20 at 16:46
  • Right, but it doesn’t say to use map instead of for loops. And I know what vectorisation is, though it adds a layer of complexity that’s not generally needed. I’ve never heard anybody say that you shouldn’t use for loops in python. That definitely not the issue here. – Aidan Kane Jan 16 '20 at 00:47
  • 1
    @AidanKane please take a look at this. https://stackoverflow.com/questions/8097408/why-python-is-so-slow-for-a-simple-for-loop – Shahryar Jan 16 '20 at 06:18
  • We were told that the dataset is much smaller than that though (100x200), so by my reckoning the operation using for loops should take under 2ms. The question says it’s very slow. I think we’re both trying to attack different questions by starting with different assumptions about how big the data is and how slow the operation is. My guess is that they’re actually hitting a dB during the operations without knowing it. But really unless @pam provides more info we’ll never get to the bottom of it. – Aidan Kane Jan 16 '20 at 07:27
  • Yes i thought that too. – Shahryar Jan 16 '20 at 13:12