1

I've got the following code in python and I think I'd need some help optimizing it.
I'm reading in a few million lines of data, but then throwing out most of them if one coordinate per line is not fitting my criterion.
The code is as following:

def loadFargoData(dataname, thlimit):
    temp = np.loadtxt(dataname)
    return temp[ np.abs(temp[:,1]) < thlimit ]

I've coded it as if it were C-type code and of course in python now this is crazy slow.
Can I throw out my temp object somehow? Or what other optimization can the Pythonian population help me with?

  • May be a duplicete of this question: http://stackoverflow.com/questions/14645789/numpy-reading-file-with-filtering-lines-on-the-fly – Zefick Apr 12 '17 at 12:27
  • @Zefick: Thanks for the link. Indeed that would solve my problem, if it is possible to construct regular expressions that mimick mathematical operations like \ge... Is that possible? – AtmosphericPrisonEscape Apr 12 '17 at 12:42

1 Answers1

1

The data reader included in pandas might speed up your script. It reads faster than numpy. Pandas will produce a dataframe object, easy to view as a numpy array (also easy to convert if preferred) so you can execute your condition in numpy (which looks efficient enough in your question).

import pandas as pd

def loadFargoData(dataname, thlimit):
    temp = pd.read_csv(dataname)  # returns a dataframe
    temp = temp.values            # returns a numpy array
    # the 2 lines above can be replaced by   temp = pd.read_csv(dataname).values
    return temp[ np.abs(temp[:,1]) < thlimit ]

You might want to check up Pandas' documentation to learn the function arguments you may need to read the file correctly (header, separator, etc).

jberrio
  • 972
  • 2
  • 9
  • 20
  • Sorry for the late accepting of your answer, I was busy with another project. The Pandas reader in fact works wonderful, I get a speed-up by a factor of 15-16 for my data. – AtmosphericPrisonEscape Apr 25 '17 at 19:18