2

I have a large array of size ~ (10^8 x 5). I want to be able to write this data file for use in some other program.

Following the advice in this StackOverflow answer, I am writing to disk just once using numpy.savetxt, rather than calling write() repeatedly. However, given the size of my data, this is both slow and produces a very large file (~47 GB).

I realise that I can use something like numpy.save, but I want to be able to read the output file with Fortran. It is not necessary that it is human-readable.

Best-options / best practices ?

Thanks

Some example code:

    import numpy as np
    from scipy.interpolate import interpld as interpld

    #Load the data
    data = np.loadtxt('path/to/file/file.txt')

    #Extract the data - let's just say it is x,y,z coordinates
    x = data[:,0]
    y = data[:,1]
    z = data[:,2]

    # Do some interpolation to increase the 'resolution' of xyz coordinates

    baseline = np.zeros(len(x))
    for k in range(len(baseline)):
        baseline[k] = k

    N = 10000 # set resolution relative to baseline
    IntBaseline = np.linspace(0,baseline[-1],len(baseline)*N)


    gx = interpld(baseline,x)
    gy = interpld(baseline,y)
    gz = interpld(baseline,z)

    interpolated_x = gx(IntBaseline)
    interpolated_y = gy(IntBaseline)
    interpolated_z = gz(IntBaseline)


    # Now write everything to an array and save

    outfile = np.zeros((len(interpolated_x),3)
    outfile[:,0] = interpolated_x
    outfile[:,1] = interpolated_y
    outfile[:,2] = interpolated_z

    np.savetxt('Interpolated_Outfile.txt', outfile)
Community
  • 1
  • 1
user1887919
  • 829
  • 2
  • 9
  • 24
  • It depends on how the data is produced. Without details of what you are doing, this can't readily be answered (note that the post you like to has specific code, leading t othe `numpy.savetxt` answer there). – Martijn Pieters Apr 25 '17 at 12:48
  • 1
    consider using HDF5 format - it's pretty fast and AFAIK is supported in Fortran. If your array can be converted to Pandas DataFrame, then it's going to be very easy – MaxU - stand with Ukraine Apr 25 '17 at 12:51
  • Maybe `numpy.ndarray.tofile` ? Whatever route you go, make sure it's a binary route. – High Performance Mark Apr 25 '17 at 12:53
  • Thanks all for comments - I have now given an example of the code if this helps. I will investigate HDF5 and `np.ndarray.tofile`. – user1887919 Apr 25 '17 at 13:10
  • Both `loadtxt` and `savetxt` read/write the text file line by line. `savetxt` just iterates on the rows of your array, formats and writes them. `pandas` has a faster `csv` reader; I don't know about its writer. But it's not going to help with the interpolation. I don't see how `tofile` is going to help with Fortran read; it's just as Python specific as `np.save`. – hpaulj Apr 25 '17 at 15:55

0 Answers0