Maybe I start will a small introduction for my problem. I'm writing a python program which will be used for post-processing of different physical simulations. Every simulation can create up to 100 GB of output. I deal with different informations (like positions, fields and densities,...) for different time steps. I would like to have the access to all this data at once which isn't possible because I don't have enough memory on my system. Normally I use read file and then do some operations and clear the memory. Then I read other data and do some operations and clear the memory.
Now my problem. If I do it this way, then I spend a lot of time to read data more than once. This take a lot of time. I would like to read it only once and store it for an easy access. Do you know a method to store a lot of data which is really fast or which doesn't need a lot of space.
I just created a method which is around ten times faster then a normal open-read. But I use cat
(linux command) for that. It's a really dirty method and I would like to kick it out of my script.
Is it possible to use databases to store this data and to get the data faster than normal reading? (sorry for this question, but I'm not a computer scientist and I don't have a lot of knowledge behind databases).
EDIT:
My cat-code look something like this - only a example:
out = string.split(os.popen("cat "+base+"phs/phs01_00023_"+time).read())
# and if I want to have this data as arrays then I normally use and reshape (if I
# need it)
out = array(out)
out = reshape(out)
Normally I would use a numpy Method numpy.loadtxt
which need the same time like normal reading.:
f = open('filename')
f.read()
...
I think that loadtxt
just use the normal methods with some additional code lines.
I know there are some better ways to read out data. But everything what I found was really slow. I will now try mmap
and hopefully I will have a better performance.