4

I've found this code for swapping case, but I'm a bit confused on how it works.

class Main {
  private static String swapCase(String s) {
      String r = "";
      for (char c : s.toCharArray())
          r += c ^= 32; // this line
      return r;
  }

  public static void main(String[] args) {
    System.out.println(swapCase("Hello"));
  }
}

I understood that it loops over each character. But, I can't wrap my head around the line (especially the XOR operator)

r += c ^= 32; 

I mean what's the significance of 32? How it swaps the case?

halfer
  • 19,824
  • 17
  • 99
  • 186
vrintle
  • 5,501
  • 2
  • 16
  • 46
  • This implementation is not a particularly good way of doing it, anyway. Very inefficient. – Andy Turner May 13 '19 at 18:52
  • @Andy, yes indeed. Basically, it was a codegolf. – vrintle May 13 '19 at 18:53
  • And, it works only for the letters and almost only the letters in the [C0 Controls and Basic Latin](http://www.unicode.org/charts/nameslist/index.html) block, which is a rather unrealistic constraint even on things called by the English term "word". – Tom Blodget May 13 '19 at 22:16

3 Answers3

4

This is how ASCII was set-up.

Letter from a-z have the 6-th bit set to 1; while letters from A-Z have the 6-th bit set to 0.

32 = 100000 // the 6-th bit is set to 1

doing a XOR with an int will invert that 6-th bit.

You could do a little of debugging and see yourself:

for (char c : s.toCharArray()) {
        System.out.println(Integer.toBinaryString((int) c));
        c ^= 32; // this line
        System.out.println(Integer.toBinaryString((int) c));
}
Eugene
  • 117,005
  • 15
  • 201
  • 306
1

For ASCII encoding 32 is the difference between a lower-case letter and an uppercase letter. It's a power of two, its binary representation is a single 1-bit: 0010 0000.

By applying the XOR assignment operator, you change this bit in the character value. Effectively adding (if the bit is 0 in c) or subtracting (bit is 1 in c) 32.

This will work fine for letters A-Z and a-z but will most likely do nonsense for most other characters in the input.

Jochen Reinhardt
  • 833
  • 5
  • 14
1

Let see this table and you will understand why

a = 01100001    A = 01000001 
b = 01100010    B = 01000010 
c = 01100011    C = 01000011 
d = 01100100    D = 01000100 
e = 01100101    E = 01000101 
f = 01100110    F = 01000110 
g = 01100111    G = 01000111 
h = 01101000    H = 01001000 
i = 01101001    I = 01001001 
j = 01101010    J = 01001010 
k = 01101011    K = 01001011 
l = 01101100    L = 01001100 
m = 01101101    M = 01001101 
n = 01101110    N = 01001110 
o = 01101111    O = 01001111 
p = 01110000    P = 01010000 
q = 01110001    Q = 01010001 
r = 01110010    R = 01010010 
s = 01110011    S = 01010011 
t = 01110100    T = 01010100 
u = 01110101    U = 01010101 
v = 01110110    V = 01010110 
w = 01110111    W = 01010111 
x = 01111000    X = 01011000 
y = 01111001    Y = 01011001 
z = 01111010    Z = 01011010 

The only difference from the upper and lower version is the 5th bit (count from 0). That's why with a simple XOR mask, you can change the case back and forth.

thanh ngo
  • 834
  • 5
  • 9