I've got a function that takes different parameters as input to go and look into a reference table for the cell matching those parameters (using .loc
function). This function is part of a larger function but when I profiled my code I realised 99% of the time was spent trying to loc the cell and I don't know if it's possible to speed this up..
The reference table on which the loc is performed is about 500k rows. Some columns contains string, some others contains float.
Here's the profiled code:
Timer unit: 1e-06 s
Total time: 0.041261 s
File: <ipython-input-106-62a9b3c7d0c0>
Function: convert_position at line 38
Line # Hits Time Per Hit % Time Line Contents
==============================================================
38 def convert_position(transcript, exon, delta, genome=gtf_test):
39
40
41 1 41259.0 41259.0 100.0 start = genome.loc[(genome['transcript_id'].values == transcript) & (genome['exon_number'].values == str(exon)), 'Start'].item()
42
43 1 2.0 2.0 0.0 position = start + delta
44
45 1 0.0 0.0 0.0 return position
From what I could find it looks like this is the fastest I can get using loc but maybe there's an alternative that doesn't rely on loc and that would be even faster ?
The main function (which call that one) is applied on a dataframe column with 8M rows so even a small decrease in computing time can result in a lot of spared time.
Thanks in advance for your help !