7

This code looks obviously incorrect and yet it happily compiles and runs on my machine. Can someone explain how this works? For example, what makes the ")" after the class name valid? What about the random words strewn around?

class M‮{public static void main(String[]a‭){System.out.print(new char[]{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}

Test online: https://ideone.com/t1W5Vm
Source: https://codegolf.stackexchange.com/a/60561

peterh
  • 11,875
  • 18
  • 85
  • 108
WoodenKitty
  • 6,521
  • 8
  • 53
  • 73
  • Have ypu tried opening it in a hex editor? There may be some "reverse" characters in there, wich makes letter look mirrored. – Bálint Apr 24 '16 at 14:33
  • Yes, there are zero-width Unicode characters that make this *appear* malformed. If you try to indent it properly you'll notice the text flow in confusing ways. – dimo414 Apr 24 '16 at 14:34
  • Peter Lawrey explained it on his blog if I remember correctly. Let me search for it. – Pshemo Apr 24 '16 at 14:35
  • 2
    From the source, there is a comment which states "There is a unicode "[RIGHT-TO-LEFT OVERRIDE](http://www.fileformat.info/info/unicode/char/202e/index.htm)" character just after the M, and the opposite (left to right) just before the a[]" – Jorel Ali Apr 24 '16 at 14:39
  • There it is: http://vanillajava.blogspot.com/2012/09/hidden-code.html also mentioned http://stackoverflow.com/questions/12857340/naming-restrictions-of-variables-in-java/12857471#12857471 – Pshemo Apr 24 '16 at 14:52

3 Answers3

9

One way to decipher what is going on is to look at the program character-by-character (demo).

There you may discover that characters in positions 7 and 42 are special UNICODE characters RLO (right-to-left order) and LRO (left-to-right order) characters.

Once you remove them, the program starts to look normal:

class M{public static void main(String[]a){System.out.print(new char[]{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}

The trick to why the obfuscated program compiles is that Java compiler ignores these special characters as a format character.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
1

This is valid java code, but it uses the arabic "align right" invisible zero-width ubicode characters. Try to place your cursor in the text and press the right arrow. There's ine between "M" and ")", and one "char[]" and "a[]".

I tried to format the code, but it's just frustrating to navigate in it.

Bálint
  • 4,009
  • 2
  • 16
  • 27
1

You will find two unicode sequences in your source

0xE2 0x80 0xAE http://www.fileformat.info/info/unicode/char/202e/index.htm

0xE2 0x80 0xAD http://www.fileformat.info/info/unicode/char/202d/index.htm

effectively writing the part: {public static void main(String[]a right to left

revau.lt
  • 2,674
  • 2
  • 20
  • 31