How to improve the speed of specific routines?

Question

I'm trying to work with some basic actuarial mathematics in Python.

I have a data base consisting of 1000+ person and their info for pension.

In this problem, I'm working with these variables: m_age: Age of the insured person in months m_tmpness: Temporariness of the benefit, if temporary. m_tmbenef: Time elapsed, in months, from the start of the benefit. m_interest: Interest rate for the benefit. tableid: ID number for an actuarial table. mytable_n: qx(probability of dying before the end of age x) from an actuarial table. I have several tables, so it's mytable_1, mytable_2, ... mytable_n

In[2]: m_age
Out[2]: [877, 877, 797, 797, 794]

In[3]: m_tmpness
Out[3]: [240, 240, 0, 120, 120]

In[4]: m_tmbenef
Out[4]: [101, 28, 0, 118, 118]

In[5]: m_interest
Out[5]: [0.0016515813019202241,
 0.0016515813019202241,
 0.0023039138595752906,
 0.0040741237836483535,
 0.0040741237836483535]

In[6]: mytable_1
Out[6]:
0       0.000337
1       0.000337
2       0.000337
...
1500    1.000000
Name: at49m, Length: 1501, dtype: float64

I have calculated lx (number of people living at age x) values for each table, to support calculating Nx and Dx. I have to calculate Dx and Nx for each person in my data base, according to their data. Dx is simply lx * 1/(1+interest)^x. Nx is the sum of all Dx values from a certain point. If x = 0, Nx = D0 + D1 + D2 + ... + Dn. If x = 50, Nx = D50 + D51 + ... + Dn. It's absolutely easy to calculate Dx for each person, but I'm struggling with it because I need all values from Dx until certain age for each person to calculate their Nx at a certain age.

So, that's what I've been trying so far:

import pandas
lx_mytable_1 = [100000 if i==0 else 0 for i in range(len(mytable_1))]
for i in range(len(mytable_1)):
lx_mytable_1[i] = lx_mytable_1[i-1]*(1-mytable_1[i-1])
### Replicate it to n tables

### Dx and Nx
def Dx(x,lx,qx,interest):
    D_x = [((1/(1+interest))**i)*lx[i] for i in range(len(qx))]
    return(D_x[x])

def Nx(x,lx,qx,interest):
    N_x = 0
    for i in range(len(qx)):
        N_x = N_x + Dx(x=i,lx=lx,qx=qx,interest=interest)
    return(N_x)

### And one should run it like this, for example:
### Nx(x=100,lx=lx_mytable_1,qx=mytable_1,interest=m_interest)

aux_NX = [0 for i in range(len(tableid))]
for i in range(len(tableid)):
    if (tableid[i] == 0):
        aux_NX[i] = 0.0
    else:
        if (tableid[i] == 1):
            aux_NX[i] = Nx(x=PBCIDADE[i],lx=lx_mytable_1,qx=mytable_1,interest=m_interest[i])
        else:
            if (tableid[i] == 2):
                aux_NX[i] = Nx(x=PBCIDADE[i],lx=lx_mytable_2,qx=mytable_2,interest=m_interest[i])
            else:
                if (tableid[i] == 3):
                    aux_NX[i] = Nx(x=PBCIDADE[i],lx=lx_mytable_3,qx=mytable_3,interest=m_interest[i])
                else:
                    if (tableid[i] == 4):
                        aux_NX[i] = Nx(x=PBCIDADE[i],lx=lx_mytable_4,qx=mytable_4,interest=m_interest[i])
                    else:
                        if (tableid[i] == 5):
                            aux_NX[i] = Nx(x=PBCIDADE[i],lx=lx_mytable_5,qx=mytable_5,interest=m_interest[i])
                        ### And as many elses and ifs as necessary... Currently I'm using 15 tables.

When I run it for a single line, that's fine. But when I run it for the 1000+ lines, it can take hours to run properly. Probably it's because I'm calling a for loop in Nx, and Dx is using another for loop, with 1500 iterations in Dx and Nx...

My question is: is there a computationally faster way to do the same? How?

score 0 · Answer 1 · answered Oct 04 '19 at 00:46

I recommend some testing with the timeit module.

You can then find out exactly which parts of your code are taking a long time to run, and which are running quickly. You can then use this knowledge to ask a more specific question about how to optimise your code.

Quickly scanning your code, I would recommend using a dictionary to store your lx_mytable_2 tables. Then you can just refer to them as lx_mytable[i] and use the same index in your existing for loop, rather than 10 if else statements. This would make your code much cleaner and you wouldn't have to write a new line of code as your data grows.

As a side note, try using the elif syntax rather than else: if: on separate lines.

How to improve the speed of specific routines?

My question is: is there a computationally faster way to do the same? How?

1 Answers1