1

I have a txt file with the following structure:

@<TRIPOS>MOLECULE
2bsm_lig.pdb
45 47 0 0 0
SMALL
USER_CHARGES

@<TRIPOS>ATOM
  1 CL25       43.5837   12.4179   37.4396    Cl      1  BSM1       -0.1770
  2 N1         40.4187    9.0729   42.8516    N.ar    1  BSM1        0.2996
  3 H1         40.0025    9.0411   43.7713    H       1  BSM1        0.2700

The first rows are just the header. Forget them.

Second, third and fourth columns are (x,y,z) spatial coordinates that I would like to store in a uni-dimensional vector, like this:

[43.5837, 12.4179, 37.4396, 40.4187, 9.0729, 42.8516, 40.0025, 9.0411, 43.7713]

I already did it using Pandas (on Python 3) but it is too slow for my program since this operation is going to be executed on a large loop. Do you know the most efficient way to read those data and store them in one array? I could be store on a single list or a numpy array.

Antonio Serrano
  • 882
  • 2
  • 14
  • 27

1 Answers1

1

This is answered in detail here and here already.

pandas has a flexibility to read large data sets in small chunks, using chunksize parameter.

you can try something like:

chunk_df = read_csv(<your_file_here>, iterator=True, chunksize=10000)  # Return TextFileReader object for iteration. See the IO Tools docs for more. 

Official documentation.

akshat
  • 1,219
  • 1
  • 8
  • 24