0

I tried to optimize the code below but I cannot figure out how to improve computation speed. Below code is taking almost 30 secs to run. this is taking time because of bootsam and filedata matrix. Can someone please help me to optimize this code Is it possible to improve the performance?

import numpy as np
filedata=np.genfromtxt('monthlydata1970to2010.txt',dtype='str') # this will creae 980 * 7 matrix
nboot=5000  
results=np.zeros((11,nboot));   #this will create 11*5000 matrix  
results[0,:]=600  
horizon=360  
balance=200  
bootsam=np.random.randint(984, size=(984, nboot)) # this will create 984*5000 matrix
for bs in range(0,nboot):  
   for mn in range(1,horizon+1):  
        if mn%12 ==1:  
            bondbal = 24*balance  
            sp500bal=34*balance  
            russbal = 44*balance  
            eafebal=55*balance  
            cashbal =66*balance  
            bondbal=bondbal*(1+float(filedata[bootsam[mn-1,bs]-1,2]))  
            sp500bal=sp500bal*(1+float(filedata[bootsam[mn-1,bs]-1,3]))  
            russbal=russbal*(1+float(filedata[bootsam[mn-1,bs]-1,4]))  
            eafebal=eafebal*(1+float(filedata[bootsam[mn-1,bs]-1,5]))  
            cashbal=cashbal*(1+float(filedata[bootsam[mn-1,bs]-1,6]))  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal  
        else:  
            bondbal=bondbal*(1+float(filedata[bootsam[mn-1,bs]-1,2]))
            sp500bal=sp500bal*(1+float(filedata[bootsam[mn-1,bs]-1,3]))
            russbal=russbal*(1+float(filedata[bootsam[mn-1,bs]-1,4]))
            eafebal=eafebal*(1+float(filedata[bootsam[mn-1,bs]-1,5]))
            cashbal=cashbal*(1+float(filedata[bootsam[mn-1,bs]-1,6]))
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
            if mn == 60:
               results[1,bs]=balance
            if mn == 120: 
               results[2,bs]=balance
            if mn == 180:
               results[3,bs]=balance
            if mn == 240:
               results[4,bs]=balance
            if mn == 300: 
               results[5,bs]=balance  
Invincible
  • 412
  • 3
  • 19
  • 1
    `1+float(100)` is the same thing as `101.` – mgilson Mar 01 '13 at 05:05
  • 1
    It would probably help if you said what you were trying to do with the code instead of asking how to improve it. – Hunter McMillen Mar 01 '13 at 05:07
  • use `timeit` module if you want to check the amount of time it is taking – avasal Mar 01 '13 at 05:08
  • while `timeit` is the best way to time something, If you're getting times on the order of 7s using `datetime.now()`, `timeit` will likely say the same thing. – mgilson Mar 01 '13 at 05:11
  • @HunterMcMillen: basically I am converting some matlab code into python,I can not write actual code because that is confidential, (1+float(100)) Here 100 is coming from two dimension string matrix, that why I have written float to convert string variable. – Invincible Mar 01 '13 at 05:13
  • @mgilson:: here (1+ float(100)) would be (1+float(twodmatrix[bs,mn]). – Invincible Mar 01 '13 at 05:14
  • If you are dealing with matrices, consider using numpy. It can also read simple CSV type files into matrices. – M456 Mar 01 '13 at 05:22
  • try making the final computation outside of the two for loops, it seems to serve no purpose except at the end. – Mike Mar 01 '13 at 05:23
  • Use `xrange` instead of `range`, try using generators in your for loop if makes sense (`yield`). Might help a bit. – zz3599 Mar 01 '13 at 05:23
  • @Mike:I haven't written full code here. I have to compute balance inside every for loop. Any other idea – Invincible Mar 01 '13 at 05:50
  • I think the only reason this code runs as fast as it does now is that at some early point all balances are converted to `inf`. Using integers and taking the powers directly without a loop takes 1.5 times longer. – Octipi Mar 01 '13 at 05:55
  • you can also use `itertools` for the two nested loops: `for bs, mn in itertools.product(np.arange(nboot), np.arange(1,horizon+1))` – Francesco Montesano Mar 01 '13 at 09:48

2 Answers2

5

Basic Algebra: executing x = x * 1.23 360 times can be easily converted to a single execution of

x = x * (1.23 ** 360)

Refactor your code and you'll see that the loops are not really needed.

Ron Klein
  • 9,178
  • 9
  • 55
  • 88
2

It is difficult to answer without seeing the real code. I can't get your sample working because balance is set to inf early in the code, as it has been noticed in the comments to the question. Anyway a pretty obvious optimization is not to read the bootsam[mn-1,bs] element five times at every iteration in order to compute the xxbal variables. All those variables use the same bootsam element so you should read the element once and reuse it:

for bs in xrange(0,nboot):
   for mn in xrange(1,horizon+1):
        row = bootsam[mn-1,bs]-1
        if (mn % 12) == 1:  
            bondbal = 24*balance
            sp500bal=34*balance
            russbal = 44*balance
            eafebal=55*balance
            cashbal =66*balance

            bondbal=bondbal*(1+float(filedata[row,2]))  
            sp500bal=sp500bal*(1+float(filedata[row,3]))  
            russbal=russbal*(1+float(filedata[row,4]))  
            eafebal=eafebal*(1+float(filedata[row,5]))  
            cashbal=cashbal*(1+float(filedata[row,6]))  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
        else:  
            bondbal=bondbal*(1+float(filedata[row,2]))  
            sp500bal=sp500bal*(1+float(filedata[row,3]))  
            russbal=russbal*(1+float(filedata[row,4]))  
            eafebal=eafebal*(1+float(filedata[row,5]))  
            cashbal=cashbal*(1+float(filedata[row,6]))  

The optimized code (which uses a fake value for balance) runs nearly twice faster than the original one on my old Acer Aspire.

Update

If you need further optimizations you can do at least two more things:

  • do not add 1 and convert to float at every accessed element of filedata. Instead add 1 to the array at creation time and give it a float datatype.
  • do not use arithmetic expressions that mix numpy and built-in numbers because Python arithmetic works slower (you can read more on this problem in this SO thread)

The following code follows those advices:

filedata=np.genfromtxt('monthlydata1970to2010.txt',dtype='str') # this will creae 980 * 7 matrix
my_list = (np.float(1) + filedata.astype(np.float)).tolist() # np.float is converted to Python float
nboot=5000
results=np.zeros((11,nboot))   #this will create 11*5000 matrix
results[0,:]=600  
horizon=360
balance=200
bootsam=np.random.randint(5, size=(984, nboot)) # this will create 984*5000 matrix
for bs in xrange(0,nboot):
   for mn in xrange(1,horizon+1):
        row = int(bootsam[mn-1,bs]-1)
        if (mn % 12) == 1:
            bondbal = 24*balance
            sp500bal=34*balance
            russbal = 44*balance
            eafebal=55*balance
            cashbal =66*balance

            bondbal=bondbal*(my_list[row][2])  
            sp500bal=sp500bal*(my_list[row][3])  
            russbal=russbal*(my_list[row][4])  
            eafebal=eafebal*(my_list[row][5])  
            cashbal=cashbal*(my_list[row][6])  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
        else:  
            bondbal=bondbal*(my_list[row][2])  
            sp500bal=sp500bal*(my_list[row][3])  
            russbal=russbal*(my_list[row][4])  
            eafebal=eafebal*(my_list[row][5])  
            cashbal=cashbal*(my_list[row][6])  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal  

With those changes the code runs nearly twice faster than the previously optimized code.

Community
  • 1
  • 1
Vicent
  • 5,322
  • 2
  • 28
  • 36
  • thanks for looking into the code. Your comments really save my time. Now I have replaced variable from row and able to execute code in 15 sec. instead of 30 sec. – Invincible Mar 03 '13 at 04:14
  • I have run the code in my local machine and I am able to execute this code. Can you please further optimize this code – Invincible Mar 03 '13 at 04:15
  • you are awesome. Thanks we started from 28 sec. now reached to 4 sec. thanks once again. – Invincible Mar 03 '13 at 17:23