1

I have read in several places that building a huffman encoder in GPU is not very efficient because the algorithm is sequential. But this paper offers a possible implementation and claims it to be faster than CPU http://tesla.rcub.bg.ac.rs/~taucet/docs/papers/PAVLE-AnaBalevic09.pdf .

Please advice if the results of the paper are incorrect

Programmer
  • 6,565
  • 25
  • 78
  • 125

2 Answers2

3

It looks like an interesting approach but I'll just offer one caveat: there is very little information about the baseline CPU implementation, but it is most likely single threaded and may not be particularly optimised. It's human nature for people to want to make their optimised implementation look as good as possible, so they tend to use a mediocre baseline benchmark in order to give an impressive speed up ratio. For all we know it may be that a suitably optimised multi-threaded implementation on the CPU could match the GPGPU performance, in which case the GPGPU implementation would not be so impressive. Before investing a lot of effort in a GPGPU implementation I would want to first exhaust all the optimisation possibilities on the CPU (perhaps even using the parallel algorithm as described in the paper, maybe exploit SIMD, threading, etc), since a CPU implementation that meets your performance requirements would be a lot more portable and useful than a solution tied to a particular GPU architecture.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • What do mean by a "particular GPU architecture" – Programmer Feb 04 '12 at 17:12
  • @Programmer: well if you develop say a CUDA solution for current gen nVidia cards it won't work with other (non nVidia) GPU cards and it may not be optimal for future generations of nVidia cards. If this is just a piece of software with a short lifetime and/or limited distribution or little need for portability that may not matter, but in the general case it could be severely limiting and high maintenance if it's an actual product. – Paul R Feb 04 '12 at 18:09
  • How about I just run the first phase of the algorithm on the GPU. That is the phase in which we build the frequency table for the terms. I can use cudpp hashtable to gather counts .the actual algorithm can be run on the CPU – Programmer Feb 04 '12 at 18:12
  • 2
    Why do you even want/need to do this on the GPU ? Is it for a student project or something ? If it's for a real world application then you really should explore all the CPU optimisation possibilities first and establish whether you really do need to look at a GPGPU solution or not. – Paul R Feb 04 '12 at 18:21
  • A GPU implementation on top of Metal for iOS can be found linked at this SO question: https://stackoverflow.com/q/3013391/763355 – MoDJ Jan 03 '18 at 22:02
1

You are right - Huffman algorithm is sequential, though it's not a bottleneck for high speed encoding. Please have a look at the following session on GTC 2012. This is real solution, not just an example.

You can find there some benchmarks for CPU and GPU concerning Huffman encoding and decoding. Huffman encoding on GPU is much faster than on CPU. JPEG decoding on GPU could be much slower in comparison with CPU only in the case when there are no restart markers in JPEG image.

If you need Huffman not for JPEG, then one should use two-pass algorithm. At first pass one can collect statistics and do encoding on the second pass. Both passes could be done in parallel, so it's better to use GPU instead of CPU.

There are a lot of papers saying that GPU is not suitable for Huffman. It just means that there were a lot of attempts to solve the problem. The idea for solution is quite simple: do Huffman encoding for small chunks of data and try to process these chunks in parallel.

Fyodor
  • 51
  • 3