Python: efficient way of reading file to store data in array

Question

I have a txt file with the following structure:

@<TRIPOS>MOLECULE
2bsm_lig.pdb
45 47 0 0 0
SMALL
USER_CHARGES

@<TRIPOS>ATOM
  1 CL25       43.5837   12.4179   37.4396    Cl      1  BSM1       -0.1770
  2 N1         40.4187    9.0729   42.8516    N.ar    1  BSM1        0.2996
  3 H1         40.0025    9.0411   43.7713    H       1  BSM1        0.2700

The first rows are just the header. Forget them.

Second, third and fourth columns are (x,y,z) spatial coordinates that I would like to store in a uni-dimensional vector, like this:

[43.5837, 12.4179, 37.4396, 40.4187, 9.0729, 42.8516, 40.0025, 9.0411, 43.7713]

I already did it using Pandas (on Python 3) but it is too slow for my program since this operation is going to be executed on a large loop. Do you know the most efficient way to read those data and store them in one array? I could be store on a single list or a numpy array.

parse it yourself with iterators instead to avoid memory overhead — Netwave, May 11 '18 at 11:54

akshat · Accepted Answer · 2018-05-11T15:48:57.340

1

This is answered in detail here and here already.

pandas has a flexibility to read large data sets in small chunks, using chunksize parameter.

you can try something like:

chunk_df = read_csv(<your_file_here>, iterator=True, chunksize=10000)  # Return TextFileReader object for iteration. See the IO Tools docs for more.

Official documentation.

edited May 11 '18 at 15:48

answered May 11 '18 at 12:07

akshat

1,219
1
8
24

chunksize did the trick. Thanks! – Antonio Serrano May 12 '18 at 18:17

Python: efficient way of reading file to store data in array

1 Answers1