2

I have a >100MB file that needs to be read with numpy.loadtxt()

The reading part is the main bottleneck in my code. For a 72MB file it takes 17.3s

Is is somehow possible to read in a parallel way a file using loadtxt()

If possible without splitting the file.

Tengis
  • 2,721
  • 10
  • 36
  • 58

1 Answers1

2

It looks like numpy.loadtxt() is your problem.

http://wesmckinney.com/blog/?p=543

http://codrspace.com/durden/performance-lessons-for-reading-ascii-files-into-numpy-arrays/

According to these sites, you're better off not using numpy's load function at all.

pandas.read_csv and read_table should be helpful from the pandas module.

horta
  • 1,110
  • 8
  • 17