2

I am working with 3D volumetric images, possibly large (256x256x256). I have 3 such volumes that I want to read in and operate on. Presently, each volume is stored as a text file of numbers which I read in using ifstream. I save it as a matrix (This is a class I have written by dynamic allocation of a 3D array). Then I perform operations on these 3 matrices, addition, multiplication and even Fourier transform. So far, everything works well, but, it takes a hell lot of time, especially the Fourier transform since it has 6 nested loops.

I want to know how I can speed this up. Also, whether the fact that I have stored the images in text files makes a difference. Should I save them as binary or in some other easier/faster to read in format? Is fstream the fastest way I can read in? I use the same 3 matrices each time without changing them. Does that make a difference? Also, is pointer to pointer to pointer the best way to store a 3D volume? If not what else can I do?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Lisa
  • 53
  • 6
  • 1
    A 3D FFT will be inherently slow - have you done any profiling? – Roger Rowland May 17 '13 at 07:16
  • Have you written everything but yourself or are you using any library? Anyway... a good idea before you start optimizing such a thing as file readin it is a good idea to measure the time the other parts of the program takes. – ChaosCakeCoder May 17 '13 at 08:05
  • @Roger: yes, I'm aware. I haven't done any profiling. I just have rough estimates of the time taken by putting break points. – Lisa May 17 '13 at 10:05
  • @ChaosCakeCoder: I've written it myself. I don't want to use any libraries at present, just sticking with basic c++ – Lisa May 17 '13 at 10:07
  • Note that large arrays with sizes equal to a power of 2 may be very slow http://stackoverflow.com/questions/6060985/why-huge-performance-hit-2048x2048-array-versus-2047x2047 http://stackoverflow.com/questions/12264970/why-is-my-program-slow-when-looping-over-exactly-8192-elements?lq=1 http://stackoverflow.com/questions/11413855/why-is-transposing-a-matrix-of-512x512-much-slower-than-transposing-a-matrix-of?lq=1 You should change it to like [256][256][257] instead and read/write line by line, each line 256 elements, in binary form – phuclv Jun 15 '14 at 02:56

4 Answers4

4

Also, is pointer to pointer to pointer best way to store a 3d volume?

Nope thats usually very ineficient.

If not what else can I do?

Its likely that you will get better performance if you store it in a contiguous block, and use computed offsets into the block.

I'd usually use a structure like this:

class DataBlock {

  unsigned int nx;
  unsigned int ny;
  unsigned int nz;
  std::vector<double> data;

  DataBlock(in_nx,in_ny,in_nz) : 
   nx(in_nx), ny(in_ny), nz(in_nz) , data(in_nx*in_ny*in_nz, 0)
  {}

  //You may want to make this check bounds in debug builds
  double& at(unsigned int x, unsigned int y, unsigned int z) { 
    return data[ x + y*nx + z*nx*ny ];
  };

  const double& at(unsigned int x, unsigned int y, unsigned int z) const { 
    return data[ x + y*nx + z*nx*ny ];
  };

  private:
    //Dont want this class copied, so remove the copy constructor and assignment.
    DataBlock(const DataBlock&);
    DataBlock&operator=(const DataBlock&);
};
Michael Anderson
  • 70,661
  • 7
  • 134
  • 187
  • I guess you mean unsigned int nz;? Oh... you don't use it anyway. – ChaosCakeCoder May 17 '13 at 08:10
  • Thank you Michael. So in the example you gave, do I not require a destructor because a vector is used to store the data? – Lisa May 17 '13 at 12:30
  • @Lisa thats correct - I use a vector purely to avoid having to manage the memory myself - It also makes the copy constructor and assignment operators trivial if you wanted them (But in this case I'd remove them so you never accidentally copy this huge chunk of data). – Michael Anderson May 18 '13 at 13:01
  • Michael, got it. Another question - So, after I create a DataBlock object, how do I iterate through the elements? – Lisa May 20 '13 at 08:09
  • Thats what the `at` function is for. `DataBlock d(10,10,10); d.at(1,2,3)=0;` – Michael Anderson May 20 '13 at 08:11
  • Yeah, but in that case, were I to perform some operation on each element, I'd still have to have 3 loops wouldn't I? – Lisa May 20 '13 at 08:53
  • You would but your compiler should be able to optimise that out really nicely, and you should get good cache locality. It has a much harder time optimizing away nested pointer traversing like your original code. – Michael Anderson May 20 '13 at 09:29
  • Possibly, but unfortunately it still takes an unacceptable amount of time due to the fact that I'm doing fourier transform i guess!! – Lisa May 20 '13 at 11:28
  • If FFT is the bottleneck you should ensure you are using a good library and that your data is stored in a format directly compatible with the library you choose. A well respected GPL/commercial library is http://www.fftw.org/ - Theres a bigger list of FFT libraries and speed comparisons here: http://www.fftw.org/benchfft/ffts.html). – Michael Anderson May 21 '13 at 00:20
3

Storing a large (2563 elements) 3D image file as plain text is a waste of resources.

Without loss of generality, if you have a plain text file for your image and each line of your file consists of one value, you will have to read several characters until you find the end of the line (for a 3-digit number, these will be 4 bytes; 3 bytes for the digits, 1 byte for newline). Afterwards you will have to convert these single digits to a number. When using binary, you directly read a fixed amount of bytes and you will have your number. You could and should write and read it as a binary image.

There are several formats for doing so, the one I would recommend is the meta image file format of VTK. In this format, you have a plain text header file and a binary file with the actual image data. With the information from the header file you will know how large your image is and what datatype you will be using. In your program, you then directly read the binary data and save it to a 3D array.

If you really want to speed things up, use CUDA or OpenCL which will be pretty fast for your applications.

There are several C++ libraries that can help you with writing, saving and manipulating image data, including the before-mentioned VTK and ITK.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Dirk
  • 1,789
  • 2
  • 21
  • 31
  • Ok, thank you for that explanation; that helped. I will convert to binary in that case. And yes, I am aware of VTK and ITK but I have to stay away from libraries for this project. – Lisa May 17 '13 at 10:15
1

2563 is a rather large number. Parsing 2563 text strings will take a considerable amount of time. Using binary will make the reading/writing process much faster because it doesn't require converting a number to/from string, and using much less space. For example to read the number 123 as char from a text file the program will need to read it as a string and convert from decimal to binary using lots of multiplies by 10. Whereas if you had written it directly as the binary value 0b01111011 you only need to read that byte back again into memory, no conversion at all.

Using hexadecimal number may also increase reading speed since each hex digit can map directly to binary value but if you need more speed, binary file is the way to go. Just a fread command is enough to load the whole 2563 bytes = 16MB file into memory in less than 1 sec. And when you're done, just fwrite it back to file. To speedup you can use SIMD (SSE/AVX), CUDA or another parallel processing technique. You can improve the speed even further by multithreading or by only saving the non zero values because in many cases, most values will often be 0's.

Another reason maybe because your array is large and each dimension is a power of 2. This has been discussed in many questions on SO:

You may consider changing the last dimension to 257 and try again. Or better use another algorithm like divide and conquer that's more cache friendly

phuclv
  • 37,963
  • 15
  • 156
  • 475
0

You should add timers around the load and the process so you know which is taking the most time, and focus your optimization efforts on it. If you control the file format, make one that is more efficient to read. If it is the processing, I'll echo what previous folks have said, investigate efficient memory layout as well as GPGPU computing. Good luck.

Dithermaster
  • 6,223
  • 1
  • 12
  • 20