I have some python code that has many classes. I used cProfile
to find that the total time to run the program is 68 seconds. I found that the following function in a class called Buyers
takes about 60 seconds of those 68 seconds. I have to run the program about 100 times, so any increase in speed will help. Can you suggest ways to increase the speed by modifying the code? If you need more information that will help, please let me know.
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
## Initialize count of customers to zero
## Set self.customers and self.nonCustomers to empty lists
price = priceVector[-1]
count = 0
self.customers = []
self.nonCustomers = []
for person in self.people:
if person.utility >= price:
person.customer = 1
self.customers.append(person)
else:
person.customer = 0
self.nonCustomers.append(person)
return len(self.customers)
self.people
is a list of person
objects. Each person
has customer
and utility
as its attributes.
EDIT - responsed added
-------------------------------------
Thanks so much for the suggestions. Here is the response to some questions and suggestions people have kindly made. I have not tried them all, but will try others and write back later.
(1) @amber - the function is accessed 80,000 times.
(2) @gnibbler and others - self.people is a list of Person objects in memory. Not connected to a database.
(3) @Hugh Bothwell
cumtime taken by the original function - 60.8 s (accessed 80000 times)
cumtime taken by the new function with local function aliases as suggested - 56.4 s (accessed 80000 times)
(4) @rotoglup and @Martin Thomas
I have not tried your solutions yet. I need to check the rest of the code to see the places where I use self.customers before I can make the change of not appending the customers to self.customers list. But I will try this and write back.
(5) @TryPyPy - thanks for your kind offer to check the code.
Let me first read a little on the suggestions you have made to see if those will be feasible to use.
EDIT 2
Some suggested that since I am flagging the customers and noncustomers in the self.people
, I should try without creating separate lists of self.customers
and self.noncustomers
using append. Instead, I should loop over the self.people
to find the number of customers. I tried the following code and timed both functions below f_w_append
and f_wo_append
. I did find that the latter takes less time, but it is still 96% of the time taken by the former. That is, it is a very small increase in the speed.
@TryPyPy - The following piece of code is complete enough to check the bottleneck function, in case your offer is still there to check it with other compilers.
Thanks again to everyone who replied.
import numpy
class person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople):
self.people = []
self.cus = []
self.noncus = []
numpy.random.seed(1)
utils = numpy.random.uniform(0, 300, numpeople)
for u in utils:
per = person(u)
self.people.append(per)
popn = population(300)
def f_w_append():
'''Function with append'''
P = 75
cus = []
noncus = []
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
return len(cus)
def f_wo_append():
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
EDIT 3: It seems numpy is the problem
This is in response to what John Machin said below. Below you see two ways of defining Population
class. I ran the program below twice, once with each way of creating Population
class. One uses numpy and one does not use numpy. The one without numpy takes similar time as John found in his runs. One with numpy takes much longer. What is not clear to me is that the popn
instance is created before time recording begins (at least that is what it appears from the code). Then, why is numpy version taking longer. And, I thought numpy was supposed to be more efficient. Anyhow, the problem seems to be with numpy and not so much with the append, even though it does slow down things a little. Can someone please confirm with the code below? Thanks.
import random # instead of numpy
import numpy
import time
timer_func = time.time # using Mac OS X 10.5.8
class Person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class Population(object):
def __init__(self, numpeople):
random.seed(1)
self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
self.cus = []
self.noncus = []
# Numpy based
# class Population(object):
# def __init__(self, numpeople):
# numpy.random.seed(1)
# utils = numpy.random.uniform(0, 300, numpeople)
# self.people = [Person(u) for u in utils]
# self.cus = []
# self.noncus = []
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
t0 = timer_func()
for i in xrange(20000):
x = f_wo_append(popn)
t1 = timer_func()
print t1-t0
Edit 4: See the answers by John Machin and TryPyPy
Since there have been so many edits and updates here, those who find themselves here for the first time may be a little confused. See the answers by John Machin and TryPyPy. Both of these can help in improving the speed of the code substantially. I am grateful to them and others who alerted me to slowness of append
. Since, in this instance I am going to use John Machin's solution and not use numpy for generating utilities, I am accepting his response as an answer. However, I really appreciate the directions pointed out by TryPyPy also.