3

We have hundreds of images which our computer gets at a time and we need to rotate and resize them as fast as possible. Rotation is done by 90, 180 or 270 degrees.

Currently we are using the command line tool GraphicsMagick to rotate the image. Rotating the images (5760*3840 ~ 22MP) takes around 4 to 7 seconds.

The following python code sadly gives us equal results

import cv
img = cv.LoadImage("image.jpg")
timg = cv.CreateImage((img.height,img.width), img.depth, img.channels) # transposed image

# rotate counter-clockwise
cv.Transpose(img,timg)
cv.Flip(timg,timg,flipMode=0)
cv.SaveImage("rotated_counter_clockwise.jpg", timg)

Is there a faster way to rotate the images using the power of the graphics card? OpenCL and OpenGL come to mind but we are wondering whether a performance increase would be noticable.

The hardware we are using is fairly limited as the device should be as small as possible.

The software is debian 6 with official (closed source) radeon drivers.

Community
  • 1
  • 1
Thomaschaaf
  • 17,847
  • 32
  • 94
  • 128
  • 5
    While reading this question, I wondered to myself: What percentage of the time is spent doing each part of this operation? How much of the wait is from the JPEG encoding vs. the actual rotation operation? And how much of the wait is from disk IO? The answers to those questions might have an impact on your optimizations. – csd Jul 09 '12 at 14:26
  • 1
    Just use jpeg tran, which, as a nice side effect, doesn't impair the quality. – datenwolf Jul 09 '12 at 16:50
  • Can you provide timings for each part of the code you pasted (after loading, after transpose, after flip, after save)? – Daniel Mošmondor Jul 10 '12 at 23:27

3 Answers3

12

you can perform a lossless rotation that will just modify the EXIF section. This will rotate your pictures faster.

and have a look at jpegtran utility which performs lossless jpeg modifications. https://linux.die.net/man/1/jpegtran

A.G.
  • 1,279
  • 1
  • 11
  • 28
  • Changing Exif Orientation tag is probably the fastest method. However not all image viewers honor it. `jpegtran` seems like a good solution. It will only partially re-compress your image which should still be pretty fast. – Piotr Praszmo Jul 09 '12 at 14:38
  • 3
    If the image width/height are multiples of 8 you can rotate 90/180/270deg by simply reordering the components without recompression – Martin Beckett Jul 09 '12 at 14:39
  • 2
    @MartinBeckett: Note that most JPEG images are stored as multiple of 8 dimensions and just apply cropping afterwards, so jpegtran should be able to reorder components for most images. – datenwolf Jul 09 '12 at 16:53
4

There is a jpeg no-recompression plugin for irfanview which IIRC can rotate and resize images (in simple ways) without recompressing, it can also run an a directory of images - this should be a lot faster

The GPU probably wouldn't help, you are almost certainly I/O limited in opencv, it's not really optomised for high speed file access

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • 2
    Here you'll find more utilities that will do this rotation losslessy without uncompressing and compressing image again: http://jpegclub.org/losslessapps.html – Mārtiņš Možeiko Jul 09 '12 at 14:29
  • For a large number of images, buffering and/or async memory transfers can alleviate the I/O bottleneck - so I wouldn't say that a GPU-based implementation won't help. – Ani Jul 09 '12 at 14:44
  • @ananthonline - if the jpeg is simply rotating multiples of 90 then you just have to reshuffle the compressed values in each 8x8 block. A GPU doesn't really help there and is generaly slow at random memory read/writes, even once you have the data on the card. It may be faster if you were recompressing, although DCT with SSE2 is very fast – Martin Beckett Jul 09 '12 at 14:48
  • Well - you DO have to recompress certain blocks because the image size changed, no? And for a large image, even those will benefit from the massive parallelism of the GPU. And decoding + lossy rotate options become feasible when using the GPU. – Ani Jul 09 '12 at 14:55
1

I'm not an expert in jpeg and compression topics, but as your problem is pretty much as I/O limited as it gets (assuming that you can rotate without heavy de/encoding-related computation), you you might not be able to accelerate it very much on the GPU you have. (Un)Luckily your reference is a pretty slow Atom CPU.

I assume that the Radeon has separate main memory. This means that data needs to be communicated through PCI-E which is the extra latency compared to CPU execution and without hiding you can be sure that it is the bottleneck. This is the most probable reason why your code that uses OpenCV on the GPU is slow (besides the fact that you do two memory-bound operations, transpose & flip, instead of a single one).

The key thing is to hide as much of the PCI-E transfer times with computation as possible by using multiple-buffering. Overlapping transfers both to and from the GPU with computation by making use of the full-duplex capability of PCI-E will only work if the card in question has dual-DMA engines like high-end Radeons or the NVIDIA Quadro/Tesla cards -- which I highly doubt.

If your GPU compute-time (the time it takes the GPU to do the rotation) is lower than the time the transfer takes, you won't be able to fully overlap. The HD 4530 has a pretty slow memory interface with only 12.8 Gb/s peak, and the rotation kernel should be quite memory bound. However, I can only guesstimate, but I would say that if you reach peak PCI-E transfer rate of ~1.5 Gb/s (4x PCI-E AFAIK), the compute kernel will be a few times faster than the transfer and you'll be able to overlap very little. You can simply time the parts separately without requiring elaborate asynchronous code and you can estimate how fast can you get things with an optimum overlap.

One thing you might want to consider is getting hardware which doesn't exhibit PCI-E as a bottleneck, e.g:

  • AMD APU-based system. On these platforms you will be able to page-lock the memory and use it directly from the GPU;
  • integrated GPUs which share main memory with the host;
  • a fast low-power CPU like a mobile Intel Ivy Bridge e.g. i5-3427U which consumes almost as little as the Atom D525 but has AVX support and should be several times faster.
pszilard
  • 1,942
  • 1
  • 15
  • 18