2

I need to create a lookup table that can be used in an application where speed and efficiency are extremely important. The table will store time values, which are logarithmically distributed so that each order of magnitude has an equal number of time steps. Each time value will point to an array of wavelength values which have a intensity value associated with them. So something like this:

   t         lambda           I
0.0001 ->     0.01     ->    100
   .          0.02     ->    300
   .            .             .
   .            .             .
                .             .
0.0002 ->     0.01     ->    200
   .          0.02     ->    400
   .            .             .
   .            .             .
                .             .

etc...

A function in some C code will be passed a time and wavelength and will look up the corresponding intensity from the table. The function needed to generate the correct intensity is quite taxing, so this is why I have chosen to go with a lookup table. I wish to write the lookup table to a binary file as this file will be loaded in and out of RAM on many nodes on a computing cluster. As I am not familiar with lookup tables I was wondering what would be the best (as in fastest/most efficient) way to implement this.

Also, is it possible to write the binary file from a data structure created in python, that can then be read in C? This would be quite useful in my specific application because I am already interfacing with some python code to generate the values for the table.

jasper
  • 193
  • 3
  • 17
  • Is the table precomputed? – Andrew Johnson Aug 05 '14 at 17:34
  • The values are computed and outputted into a text file with columns of data so I will need to then put that data into some sort of structure. I was wondering what sort of structure would be best... But since I will be making it in python and loading it in C I wasn't to sure about how to go about it. – jasper Aug 05 '14 at 17:37
  • It sounds like you can use fixed length rows and just read/write int/float types into the file directly. If you are on a uniform computing cluster then you won't have to worry about big/little endian or 32bit/64bit issues and can just read bytes directly. – Andrew Johnson Aug 05 '14 at 17:40
  • Of course once you have a binary file, what you read it with does not matter. a file is a file is a file. – mckenzm May 02 '19 at 02:21

1 Answers1

2

You can use the struct module, especially struct.pack to convert Python data into a string of binary data that you can then write to a file.

What the most efficient way of accessing the data is depends on the particulars. If you are using the same range of lambda values for all time values and the time intervals are always the same, then you know the length of the array of intensities for each t. In that case you can say e.g.

offset = ((time - 0.001)/0.001 * amount_of_intensities + (lambda - 0.01)/0.01)

and then use that offset to create a pointer. This assumes that you've read the binary file into memory and have created a pointer of the right type to it.

An example (in IPython):

In [1]: import numpy as np

In [2]: data = np.random.random(20)

In [3]: data
Out[3]: 
array([ 0.40184104,  0.60411243,  0.52083848,  0.50300288,  0.14613242,
        0.39876911,  0.16157968,  0.70979254,  0.65662686,  0.14884378,
        0.65650842,  0.40906677,  0.3027295 ,  0.26070303,  0.82051509,
        0.96337179,  0.34622595,  0.08532211,  0.65079174,  0.68009011])

In [4]: import struct

In [5]: struct.pack('{}d'.format(len(data)), *data)
Out[5]: 'f\xf9\x80y\xc3\xb7\xd9?\xe2x\x92\x99\xe3T\xe3?0vCt\xb5\xaa\xe0?7\xfcJ|\x99\x18\xe0?X\xf5l\x8ew\xb4\xc2?b\x9c\xd1\xden\x85\xd9?\xc4\x0c\xad\x9d\xa4\xae\xc4?\xae\xc3\xbe\xd7\x9e\xb6\xe6?\xd5\xf3\xebV\x16\x03\xe5?\x14J\x9a$P\r\xc3?p\xd4t\xf3\x1d\x02\xe5?\xfe\tUg&.\xda?\xf4hV\x91\xeb_\xd3?@FL\xc0[\xaf\xd0?$\xbe\x08\xda\xa8A\xea?\xf3\x93\xcb\x11\xf1\xd3\xee?\xce\x9e\xd9\xe7\x90(\xd6?\x10\xd2\x12c\xab\xd7\xb5?f\xac\x124I\xd3\xe4?}\x95\x1cSL\xc3\xe5?'

I'm using the numpy module for convenience. It would work just as well with a list of floating point numbers.

To analyse the last line from the inside out. The format expression gives:

In [9]: '{}d'.format(len(data))
Out[9]: '20d'

This means we want to make a string of 20 d values. The d is the format character for a IEEE 754 double width floating point number.

So what we really have is;

struct.pack('20d', *data)

The *-operator before the data means "unpack this list".

Note that binary numbers generally not portable between different hardware platforms (e.g. intel x86 and ARM).

Once you have this big array of binary numbers, you can just write that to a file.

In C, open the file and read the whole thing into a block of memory. Then make a pointer of the correct type to the beginning of that block of memory and you're good to go.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • There is already an answer with more detail about this [here](http://stackoverflow.com/questions/18367007/python-how-to-write-to-a-binary-file). – Andrew Johnson Aug 05 '14 at 17:32
  • The time steps are logarithmically distributed, so each order of magnitude has an equal number of divisions. I suppose that changes things... I will update the question to specify that. – jasper Aug 05 '14 at 18:26
  • To clarify the above, would the array look something like this _data = [t0, w00, i00, w01, i01, ... w0n, i0n, t1, w10, i10, w11, i11, ... tn, wn0, in0, wn1, in1, .. wnn, inn]_ where **tx** is time step, **wxy** is a corresponding wavelength and **ixy** is the corresponding intensity? Then one would use **offset** to jump ahead a certain number of bits in the binary file to find the correct data point? – jasper Aug 05 '14 at 18:46
  • 1
    @jasper If the spacing of the time-steps and wavelengths are both *consistent* over the whole range, you *only* have to store the intensities. Because you can then use a formula to index the correct intensity. That the essence of a look-up table; that you can use a simple formula to find the correct number. If the number or spacing of the wavelength is not constant for all time-steps, the problem becomes a lot more complicated. – Roland Smith Aug 05 '14 at 19:58