How does this print "hello world"?

Question

I discovered this oddity:

for (long l = 4946144450195624l; l > 0; l >>= 5)
    System.out.print((char) (((l & 31 | 64) % 95) + 32));

Output:

hello world

How does this work?

This question appears to be off-topic because it is about code obfuscation. — Blazemonger, Jan 02 '14 at 16:03

score 264 · Accepted Answer · edited May 03 '22 at 12:33

The number 4946144450195624 fits 64 bits, and its binary representation is:

 10001100100100111110111111110111101100011000010101000

The program decodes a character for every 5-bits group, from right to left

 00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
   d  |  l  |  r  |  o  |  w  |     |  o  |  l  |  l  |  e  |  h

5-bit codification

For 5 bits, it is possible to represent 2⁵ = 32 characters. The English alphabet contains 26 letters, and this leaves room for 32 - 26 = 6 symbols apart from letters. With this codification scheme, you can have all 26 (one case) English letters and 6 symbols (space being among them).

Algorithm description

The >>= 5 in the for loop jumps from group to group, and then the 5-bits group gets isolated ANDing the number with the mask 31₁₀ = 11111₂ in the sentence l & 31.

Now the code maps the 5-bit value to its corresponding 7-bit ASCII character. This is the tricky part. Check the binary representations for the lowercase alphabet letters in the following table:

  ASCII   |     ASCII     |    ASCII     |    Algorithm
character | decimal value | binary value | 5-bit codification
--------------------------------------------------------------
  space   |       32      |   0100000    |      11111
    a     |       97      |   1100001    |      00001
    b     |       98      |   1100010    |      00010
    c     |       99      |   1100011    |      00011
    d     |      100      |   1100100    |      00100
    e     |      101      |   1100101    |      00101
    f     |      102      |   1100110    |      00110
    g     |      103      |   1100111    |      00111
    h     |      104      |   1101000    |      01000
    i     |      105      |   1101001    |      01001
    j     |      106      |   1101010    |      01010
    k     |      107      |   1101011    |      01011
    l     |      108      |   1101100    |      01100
    m     |      109      |   1101101    |      01101
    n     |      110      |   1101110    |      01110
    o     |      111      |   1101111    |      01111
    p     |      112      |   1110000    |      10000
    q     |      113      |   1110001    |      10001
    r     |      114      |   1110010    |      10010
    s     |      115      |   1110011    |      10011
    t     |      116      |   1110100    |      10100
    u     |      117      |   1110101    |      10101
    v     |      118      |   1110110    |      10110
    w     |      119      |   1110111    |      10111
    x     |      120      |   1111000    |      11000
    y     |      121      |   1111001    |      11001
    z     |      122      |   1111010    |      11010

Here you can see that the ASCII characters, we want to map, begin with the 7th and 6th bit set (11xxxxx₂) (except for space, which only has the 6th bit on). You could OR the 5-bit codification with 96 (96₁₀ = 1100000₂) and that should be enough to do the mapping, but that wouldn't work for space (darn space!).

Now we know that special care has to be taken to process space at the same time as the other characters. To achieve this, the code turns the 7th bit on (but not the 6th) on the extracted 5-bit group with an OR 64 64₁₀ = 1000000₂ (l & 31 | 64).

So far the 5-bit group is of the form: 10xxxxx₂ (space would be 1011111₂ = 95₁₀).

If we can map space to 0 unaffecting other values, then we can turn the 6th bit on and that should be all.

Here is what the mod 95 part comes to play. Space is 1011111₂ = 95₁₀, using the modulus operation (l & 31 | 64) % 95). Only space goes back to 0, and after this, the code turns the 6th bit on by adding 32₁₀ = 100000₂ to the previous result, ((l & 31 | 64) % 95) + 32), transforming the 5-bit value into a valid ASCII character.

isolates 5 bits --+          +---- takes 'space' (and only 'space') back to 0
                  |          |
                  v          v
               (l & 31 | 64) % 95) + 32
                       ^           ^
       turns the       |           |
      7th bit on ------+           +--- turns the 6th bit on

The following code does the inverse process, given a lowercase string (maximum 12 characters), returns the 64-bit long value that could be used with the OP's code:

public class D {
    public static void main(String... args) {
        String v = "hello test";
        int len = Math.min(12, v.length());
        long res = 0L;
        for (int i = 0; i < len; i++) {
            long c = (long) v.charAt(i) & 31;
            res |= ((((31 - c) / 31) * 31) | c) << 5 * i;
        }
        System.out.println(res);
    }
}

This answer leaves no mystery. Rather, it does your thinking for you. — , Dec 25 '13 at 10:29

score 40 · Answer 2 · edited May 03 '22 at 12:35

The following Groovy script prints intermediate values.

String getBits(long l) {
    return Long.toBinaryString(l).padLeft(8, '0');
}

for (long l = 4946144450195624l; l > 0; l >>= 5) {
    println ''
    print String.valueOf(l).toString().padLeft(16, '0')
    print '|' + getBits((l & 31))
    print '|' + getBits(((l & 31 | 64)))
    print '|' + getBits(((l & 31 | 64) % 95))
    print '|' + getBits(((l & 31 | 64) % 95 + 32))

    print '|';
    System.out.print((char) (((l & 31 | 64) % 95) + 32));
}

Here it is:

4946144450195624|00001000|01001000|01001000|01101000|h
0154567014068613|00000101|01000101|01000101|01100101|e
0004830219189644|00001100|01001100|01001100|01101100|l
0000150944349676|00001100|01001100|01001100|01101100|l
0000004717010927|00001111|01001111|01001111|01101111|o
0000000147406591|00011111|01011111|00000000|00100000|
0000000004606455|00010111|01010111|01010111|01110111|w
0000000000143951|00001111|01001111|01001111|01101111|o
0000000000004498|00010010|01010010|01010010|01110010|r
0000000000000140|00001100|01001100|01001100|01101100|l
0000000000000004|00000100|01000100|01000100|01100100|d

score 26 · Answer 3 · edited May 03 '22 at 12:16

Interesting!

Standard ASCII characters which are visible are in range of 32 to 127.

That's why you see 32 and 95 (127 - 32) there.

In fact, each character is mapped to 5 bits here, (you can find what is 5 bit combination for each character), and then all bits are concatenated to form a large number.

Positive longs are 63 bit numbers, large enough to hold encrypted form of 12 characters. So it is large enough to hold Hello word, but for larger texts you shall use larger numbers, or even a BigInteger.

In an application we wanted to transfer visible English characters, Persian characters and symbols via SMS. As you see, there are 32 (number of Persian characters) + 95 (number of English characters and standard visible symbols) = 127 possible values, which can be represented with 7 bits.

We converted each UTF-8 (16 bit) character to 7 bits, and gain more than a 56% compression ratio. So we could send texts with twice the length in the same number of SMSes. (Somehow, the same thing happened here.)

There's a lot more going on in OP's code. For instance, this doesn't really explain what the `| 64` is doing. — Ted Hopp, Dec 24 '13 at 21:43
@Amir: actually 95 is there because you need to get a space character. — Bee, Dec 29 '13 at 21:10

score 17 · Answer 4 · edited Dec 21 '13 at 15:17

17

You are getting a result which happens to be char representation of below values

104 -> h
101 -> e
108 -> l
108 -> l
111 -> o
32  -> (space)
119 -> w
111 -> o
114 -> r
108 -> l
100 -> d

edited Dec 21 '13 at 15:17

Amir Pashazadeh

7,170
3
39
69

answered Dec 21 '13 at 14:46

Vikas V

3,176
2
37
60

score 16 · Answer 5 · edited May 03 '22 at 12:25

You've encoded characters as 5-bit values and packed 11 of them into a 64 bit long.

(packedValues >> 5*i) & 31 is the i-th encoded value with a range 0-31.

The hard part, as you say, is encoding the space. The lowercase English letters occupy the contiguous range 97-122 in Unicode (and ASCII, and most other encodings), but the space is 32.

To overcome this, you used some arithmetic. ((x+64)%95)+32 is almost the same as x + 96 (note how bitwise OR is equivalent to addition, in this case), but when x=31, we get 32.

score 6 · Answer 6 · edited May 03 '22 at 12:36

6

It prints "hello world" for a similar reason this does:

for (int k=1587463874; k>0; k>>=3)
    System.out.print((char) (100 + Math.pow(2,2*(((k&7^1)-1)>>3 + 1) + (k&7&3)) + 10*((k&7)>>2) + (((k&7)-7)>>3) + 1 - ((-(k&7^5)>>3) + 1)*80));

But for a somewhat different reason than this:

for (int k=2011378; k>0; k>>=2)
    System.out.print((char) (110 + Math.pow(2,2*(((k^1)-1)>>21 + 1) + (k&3)) - ((k&8192)/8192 + 7.9*(-(k^1964)>>21) - .1*(-((k&35)^35)>>21) + .3*(-((k&120)^120)>>21) + (-((k|7)^7)>>21) + 9.1)*10));

edited May 03 '22 at 12:36

Peter Mortensen

30,738
21
105
131

answered Dec 25 '13 at 05:03

גלעד ברקן

23,602
3
25
61

22

You should explain what you're doing, instead of posting another riddle – Aleksandr Dubinsky Dec 25 '13 at 20:30
@AleksandrDubinsky I don't think so. For one thing, what I'm doing has been described in other answers on this page. And secondly, I believe some people might find it fun and helpful to learn from thinking and figuring it out for themselves. – גלעד ברקן Apr 29 '14 at 22:33
1

I suggest that you invest some effort in finding a site (perhaps some Beta StackExchange?) where contributing fun riddles is welcome. Stack Overflow is a Q&A site with a strictly enforced focus. – Marko Topolnik Apr 30 '14 at 12:33
1

@MarkoTopolnik I would hate to live in a world where all rules or focus were so strictly enforced as to never allow any exceptions. Not to mention that there are countless such exceptions on SO. – גלעד ברקן Apr 30 '14 at 21:05
1

I would, too, but SO is such a world to an unusually great extent. Sure there are exceptions even here, but they are not *welcome*. – Marko Topolnik May 01 '14 at 05:53
@MarkoTopolnik How, specifically, does my answer diverge from SO's focus (please refer me to text published by SO)? There are many answers that leave parts as "exercises for the reader" and are greatly up-voted and accepted. How is my answer different than those, given that there is already an explanation on this page? People come to SO to learn new things. In my opinion, one can sometimes learn a lot by thinking about problems that are not totally explained. I added this answer because I believe it adds additional variety to the OPs example. – גלעד ברקן May 01 '14 at 12:02
@MarkoTopolnik You are also ignoring that Bohemian's stated purpose (see his comments) in asking this question was not to understand the answer himself, since he knew it, but rather to provide a "satisfactory explanation" and to garner up-votes for his question. I approached this answer conceptually, to provide interest and variety to a question that was already framed as a kind of "game." I answered this question in the context of this specific situation, and seven people up-voted my answer, which means that they found value in it for them. – גלעד ברקן May 01 '14 at 12:16
1

Another 15 shared Alexandr's sentiment. And you are right in pointing out that the question itself is inappropriate for SO, as commented below it. – Marko Topolnik May 03 '14 at 10:46
@MarkoTopolnik You pointed out that 15 people would have preferred me to adjust my answer. As you probably know, it's a tall order to please everyone in this world. To me, it's more important to have seemed helpful to seven people. (I also doubt that my answer caused any significant distress to the 15 people you mention; after all, they can readily read the explanation in the accepted answer.) – גלעד ברקן May 05 '14 at 01:03
You are quite wrong in assuming that your answer is here to please or that anyone's distress has anything to do with its appropriateness for SO. I pointed out that, as a sample of community's appraisal of your answer, those in favor are outnumbered by a factor of two by those opposing it. You should also take note that answers are not a thread of discussion and there is no "context" of other answers to call upon. – Marko Topolnik May 05 '14 at 12:23
1

@MarkoTopolnik I don't think a critical comment means "opposing" the answer. I think it means hoping the answer were different in some way. And I disagree about there being "no context." I think there is. – גלעד ברקן May 05 '14 at 15:49
1

Not hoping, *expecting*. As for context, I am just informing you of SO policy; you are free to disagree and wish for policies to be different, but you are not free to ignore tthem as an answerer. – Marko Topolnik May 05 '14 at 16:21
@MarkoTopolnik I would (and did...) use the word, "hoping," rather than "expecting." Marko, can you please refer me to the posted/quoted text of the policy you are referring to? – גלעד ברקן May 05 '14 at 18:10
The suspense! What is the reason? – Peter Mortensen May 03 '22 at 12:37

score 3 · Answer 7 · edited May 03 '22 at 14:14

I mostly work with Oracle databases, so I would use some Oracle knowledge to interpret and explain :-)

Let's convert the number 4946144450195624 into binary. For that I use a small function called dec2bin, i.e., decimal-to-binary.

SQL> CREATE OR REPLACE FUNCTION dec2bin (N in number) RETURN varchar2 IS
  2    binval varchar2(64);
  3    N2     number := N;
  4  BEGIN
  5    while ( N2 > 0 ) loop
  6       binval := mod(N2, 2) || binval;
  7       N2 := trunc( N2 / 2 );
  8    end loop;
  9    return binval;
 10  END dec2bin;
 11  /

Function created.

SQL> show errors
No errors.
SQL>

Let's use the function to get the binary value -

SQL> SELECT dec2bin(4946144450195624) FROM dual;

DEC2BIN(4946144450195624)
--------------------------------------------------------------------------------
10001100100100111110111111110111101100011000010101000

SQL>

Now the catch is the 5-bit conversion. Start grouping from right to left with 5 digits in each group. We get:

100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000

We would be finally left with just 3 digits in the end at the right. Because, we had total 53 digits in the binary conversion.

SQL> SELECT LENGTH(dec2bin(4946144450195624)) FROM dual;

LENGTH(DEC2BIN(4946144450195624))
---------------------------------
                               53

SQL>

hello world has a total of 11 characters (including space), so we need to add two bits to the last group where we were left with just three bits after grouping.

So, now we have:

00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000

Now, we need to convert it to 7-bit ASCII value. For the characters it is easy; we need to just set the 6th and 7th bit. Add 11 to each 5-bit group above to the left.

That gives:

1100100|1101100|1110010|1101111|1110111|1111111|1101111|1101100|1101100|1100101|1101000

Let's interpret the binary values. I will use the binary to decimal conversion function.

SQL> CREATE OR REPLACE FUNCTION bin2dec (binval in char) RETURN number IS
  2    i                 number;
  3    digits            number;
  4    result            number := 0;
  5    current_digit     char(1);
  6    current_digit_dec number;
  7  BEGIN
  8    digits := length(binval);
  9    for i in 1..digits loop
 10       current_digit := SUBSTR(binval, i, 1);
 11       current_digit_dec := to_number(current_digit);
 12       result := (result * 2) + current_digit_dec;
 13    end loop;
 14    return result;
 15  END bin2dec;
 16  /

Function created.

SQL> show errors;
No errors.
SQL>

Let's look at each binary value -

SQL> set linesize 1000
SQL>
SQL> SELECT bin2dec('1100100') val,
  2    bin2dec('1101100') val,
  3    bin2dec('1110010') val,
  4    bin2dec('1101111') val,
  5    bin2dec('1110111') val,
  6    bin2dec('1111111') val,
  7    bin2dec('1101111') val,
  8    bin2dec('1101100') val,
  9    bin2dec('1101100') val,
 10    bin2dec('1100101') val,
 11    bin2dec('1101000') val
 12  FROM dual;

       VAL        VAL        VAL        VAL        VAL        VAL        VAL        VAL        VAL     VAL           VAL
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       100        108        114        111        119        127        111        108        108     101           104

SQL>

Let's look at what characters they are:

SQL> SELECT chr(bin2dec('1100100')) character,
  2    chr(bin2dec('1101100')) character,
  3    chr(bin2dec('1110010')) character,
  4    chr(bin2dec('1101111')) character,
  5    chr(bin2dec('1110111')) character,
  6    chr(bin2dec('1111111')) character,
  7    chr(bin2dec('1101111')) character,
  8    chr(bin2dec('1101100')) character,
  9    chr(bin2dec('1101100')) character,
 10    chr(bin2dec('1100101')) character,
 11    chr(bin2dec('1101000')) character
 12  FROM dual;

CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER
--------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
d         l         r         o         w         ⌂         o         l         l         e         h

SQL>

So, what do we get in the output?

d l r o w ⌂ o l l e h

That is hello⌂world in reverse. The only issue is the space. And the reason is well explained by @higuaro in his answer. I honestly couldn't interpret the space issue myself at first attempt, until I saw the explanation given in his answer.

slevy1 · Answer 8 · 2015-11-24T21:16:07.837

1

I found the code slightly easier to understand when translated into PHP, as follows:

<?php

$result=0;
$bignum = 4946144450195624;
for (; $bignum > 0; $bignum >>= 5){
    $result = (( $bignum & 31 | 64) % 95) + 32;
    echo chr($result);
}

See live code

edited Nov 24 '15 at 21:16

answered Dec 31 '14 at 21:29

slevy1

3,797
2
27
33

*Why* is it easier to understand? Can you elaborate? – Peter Mortensen May 03 '22 at 12:47
How does it answer the question? It was: *"How does this work?"* – Peter Mortensen May 03 '22 at 12:48

score 0 · Answer 9 · edited May 03 '22 at 12:40

0

Use

out.println((char) (((l & 31 | 64) % 95) + 32 / 1002439 * 1002439));

to make it capitalised.

edited May 03 '22 at 12:40

Peter Mortensen

30,738
21
105
131

answered Dec 05 '14 at 18:47

1

consider adding some explanation about what are you doing and why. – fedorqui Dec 05 '14 at 19:14

How does this print "hello world"?

9 Answers9

5-bit codification

Algorithm description

Linked