0

I have a question regarding the JPEG Huffman Table and using the Huffman Table to construct the symbol/binary string from a Tree. Suppose, that in an Huffman Table for 3-Bit code Length the number of codes is greater than 6, then how do we add all those codes in the Tree? If I am correct only 6 codes can be added at the 3-bit level/depth of the tree. So, how do we add the remaining codes if they won't fit in that level? Do we just ignore them?

Example

code length | Total Codes | Codes  
3-Bit       |    10       | 25 43 34 53 92 A2 B2 63 73 C2

In the above example if we go by order of constructing symbols/binary string for the code then up 'til A2 we can add codes in the tree at level 3-Bit, but what about B2,63,73,C2 etc? It's not possible to add them at 3-Bit level of the tree? So what do we do with them?

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
user4925
  • 209
  • 4
  • 9
  • At each level, you'll have some terminal codes, and some continuation codes. A terminal code corresponds to an actual value. A continuation code means that you need to read more bits to get the full code. So if you have a 3-bit code, you could have as many as 7 terminal codes, leaving one continuation code. – user3386109 Mar 10 '16 at 22:38
  • [This wikipedia article](https://en.wikipedia.org/wiki/Huffman_coding) may be useful. In order to design a Huffman code for your example, you would need to specify a frequency for each value. And then, only a few values would use a 3-bit code. Some of the values would have to use longer codes. – user3386109 Mar 10 '16 at 23:00
  • 1
    [This](http://stackoverflow.com/questions/1563883/decoding-a-jpeg-huffman-block-table) question/answer may be helpful. – 500 - Internal Server Error Mar 10 '16 at 23:09
  • 1
    Is this a theoretical question, or did you find a JPEG file with this data? If you're asking how to actually decode it in JPEG, then I suspect Mats's answer is not adequate. – user253751 Mar 10 '16 at 23:14
  • @ LuckyAli I can't vote for a duplicate without closing the question. Was the link provided by @500 the answer you were looking for? – user3386109 Mar 10 '16 at 23:22
  • No thats not the answer. Becasue I know how to construct the Binary strings. – user4925 Mar 10 '16 at 23:25

3 Answers3

1

Well, clearly, the absolutely highest number of "things" that can be represented in 3 bits is 8 - (000, 001, 010, 011, 100, 101, 110, 111).

In Huffman encoding, bits represent "left" or "right" in a trie data-structure, to be able to "continue", you have to use SOME codes for "this continues another level", which is why not all 8 values can be encoded in 3 bits. If you have more values to encode, you need to use more bits (for some values - this is the whole point of Huffman coding, that SOME combinations are short, others are longer, and sometimes even longer than the original, but because it's based on what is the most common, it's fine, because they will be rare...)

How to construct and decode a Huffman tree is about four-five pages in your typical Algorithms book, and if you haven't got one of those, you probably want to find one - either a real paper one, or an e-book. There are LOTS of them - I'm not going to recommend one, since the ones I have are all about 15+ years old.

I should add that I think your question is missing something. Clearly, 3 bits can not possibly represent 10 values. And you can't build a [meaningful] Huffman tree on 10 values that all different - unless the idea is to split the values into pairs of {2,5}, {4,3}, {3,4}, {5,3}, {9,2}, {A,2}, {B,2}, {6,3}, {7,3}, {C,2} - which gives a fair number of repeated values - frequency of those are: 2 : 5 3 : 5 4 : 2 5 : 2 6 : 1 7 : 1 9 : 1 A : 1 B : 1 C : 1

But that's stil too many to represent anything meaningful...

Or is it the other way around, that we are supposed to use the bit values of those to decode? In which case we'd need the tree built from the original data to decode it...

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Yeah, I meant it from a decoding point of view only. You know to construct the Tree to get the binary string/symbols of the codes in the huffman table. – user4925 Mar 10 '16 at 22:58
  • Right, then we need a tree or translation table to translate it - is that the values in Table 1 here: http://www.impulseadventure.com/photo/jpeg-huffman-coding.html – Mats Petersson Mar 10 '16 at 23:01
  • Hey even I follow impulse adventures lol. No I just made those values up. But let's take their example. In the class = 1 ID = 0(AC Table) look at 16 bit length code it has 115 codes. But all of them will not fit in the 16 bit level/depth of the tree. I think only 32 or something will fit. So what do we do with the rest of the (115-32) codes? They can't be added at that level anymore once 32 or so codes are added. – user4925 Mar 10 '16 at 23:05
  • At 16 bits, you could encode 65535 values - but since SOME are used up for shorter codes, there are lots of "holes" on the 16-bit table - but it's clearly sufficient to encode enough values for 115 different "answers". Like I said in my answer, it's a Trie data structure, where each value either ends in a node to go further, or to an encoding. The binary tree representation is quite clear on this. The value 01 is encoded as 00, the value 02 is encoded as 01, 100 makes 03, and so on. – Mats Petersson Mar 10 '16 at 23:12
  • @MatsPetersson This question appears to be asking about the JPEG format *specifically*; this data is stored in the file and JPEG will have a defined procedure for reconstructing the tree. – user253751 Mar 10 '16 at 23:15
  • @immibis: But then the values shouldn't be "just made up", right? It is impossible, any way you like, to encode 10 values into 3 bits. There needs to be more bits SOMEWHERE. – Mats Petersson Mar 10 '16 at 23:27
  • But I willingly admit, I don't know the algorithm for decoding jpeg, and I'm not going to learn enough to answer this question just now... – Mats Petersson Mar 10 '16 at 23:28
  • @Mats Can you take the same example from the impulse link but for 3 bit length take my code values i.e 25 43 34 53 92 A2 B2 63 73 C2 and construct a tree in paper. You will notice that you can add only the first six codes i.e 25,43,34,53,92 and A2 at level 3-Bit. But we can't add B2,63,73,C2. Any my question is what do we do with B2,63,73 and C2 codes(i.e remaining codes). How can we construct Binary String/Symbol for them? – user4925 Mar 10 '16 at 23:32
  • Well, the typical method is to "use more bits". You can get another 8 values into a 4 bit encoding by using 1000-1111. I'm a bit rusty on how to build/decode Huffman - the code I wrote to do that is from about 30 years ago... And I'm pretty sure I don't have it any longer - or if I do, I don't have a floppy drive to read it... – Mats Petersson Mar 10 '16 at 23:45
0

In JPEG, a Huffman code can be up to 16-bits. The DHT market contains an array of 16 elements giving the number of codes for each length.

The JPEG standard explains how to use the code counts to do the Huffman translation. It is one of the few things explained in detail.

This book explains how it is done from a programmers perspective.

JPEG Book

The number of codes that exists at any code length depends upon the counts for other lengths.

I am wondering if you are really looking at the count of codes for length 4 rather than 3.

user3344003
  • 20,574
  • 3
  • 26
  • 62
0

It looks like you're not following the correct procedure when creating your Huffman codes from the JPEG table. The count provided will fit in the number of bits unless the table has been corrupted. The reading out of the codes from a DHT marker is really simple. The more complicated part is how you define your lookup table from that data. A logical (but not practical) way is to create a reverse lookup table that's the maximum code length in size (16-bits = 65536 entries in the table). Then to decode your JPEG data, just pick up 16-bits of compressed data from the input stream and use it as an index in the table where you'll have the symbol and actual length of the code. I came up with a way to use a single, much smaller lookup table. I'm not going to share my specific code table method. What I will share is the basic format of the loop to create the codes from a DHT marker:

int iCurrentCode; // the current Huffman code
int iLength; // the code length in bits that you're working on
int i;
int iCount; // the number of codes defined for this length
int iSymbol; // JPEG symbol defined for each Huffman code
unsigned char *pData; // pointer to the data in the DHT marker

iCurrentCode = 0; // start with a Huffman code of 0
for (iLength = 1; iLength <= 16; iLength++)
{
    iCount = *pData++; // get number of symbols for this bit length
    for (i=0; i<iCount; i++) // read each of the codes for this bit length
    {
        iSymbol = *pData++; // get the JPEG symbol value (e.g. RRRR/SSSS value)
        // It's up to you to create a lookup table from the code and its value
        iCurrentCode++; // the Huffman bit pattern just increments for each code value
    } // for each code defined at this bit length
    iCurrentCode <<= 1; // shift the code left 1 bit to advance to the next bit length
} // for each bit length
BitBank
  • 8,500
  • 3
  • 28
  • 46