XOR operation with two strings in java

Question

How to do bitwise XOR operation to two strings in java.

You need to refine your question. What result are you expecting? May you provide an example? — ChrisJ, Feb 26 '11 at 11:31
I am interested in what you want to achive. Maybe some kind of encryption? :) — Daniel, Feb 26 '11 at 11:33
you can use Java Cryptography API http://download.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html — Dead Programmer, Feb 26 '11 at 12:37

score 55 · Answer 1 · answered Nov 02 '11 at 16:06

You want something like this:

import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
import java.io.IOException;

public class StringXORer {

    public String encode(String s, String key) {
        return base64Encode(xorWithKey(s.getBytes(), key.getBytes()));
    }

    public String decode(String s, String key) {
        return new String(xorWithKey(base64Decode(s), key.getBytes()));
    }

    private byte[] xorWithKey(byte[] a, byte[] key) {
        byte[] out = new byte[a.length];
        for (int i = 0; i < a.length; i++) {
            out[i] = (byte) (a[i] ^ key[i%key.length]);
        }
        return out;
    }

    private byte[] base64Decode(String s) {
        try {
            BASE64Decoder d = new BASE64Decoder();
            return d.decodeBuffer(s);
        } catch (IOException e) {throw new RuntimeException(e);}
    }

    private String base64Encode(byte[] bytes) {
        BASE64Encoder enc = new BASE64Encoder();
        return enc.encode(bytes).replaceAll("\\s", "");

    }
}

The base64 encoding is done because xor'ing the bytes of a string may not give valid bytes back for a string.

Great answer! But readers should make sure to use [`java.util.Base64`](http://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) instead of the [soon-to-be-unreachable classes from `sun.misc`](http://blog.codefx.org/java/dev/how-java-9-and-project-jigsaw-may-break-your-code/#Internal-APIs-Become-Unavailable). — Nicolai Parlog, Sep 04 '15 at 13:44
I used this sample with android.Base64 instead of sun: import android.util.Base64; also these two methods changed to this: private byte[] base64Decode(String s) { try { return Base64.decode(s,Base64.DEFAULT); } catch (IllegalArgumentException e) {throw new RuntimeException(e);} } private String base64Encode(byte[] bytes) { return Base64.encodeToString(bytes,Base64.DEFAULT).replaceAll("\\s", ""); } — JohnC, Nov 30 '16 at 23:54

Peter Lawrey · Answer 2 · 2014-02-21T11:13:47.273

27

Note: this only works for low characters i.e. below 0x8000, This works for all ASCII characters.

I would do an XOR each charAt() to create a new String. Like

String s, key;

StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.length(); i++)
    sb.append((char)(s.charAt(i) ^ key.charAt(i % key.length())));
String result = sb.toString();

In response to @user467257's comment

If your input/output is utf-8 and you xor "a" and "æ", you are left with an invalid utf-8 string consisting of one character (decimal 135, a continuation character).

It is the char values which are being xor'ed, but the byte values and this produces a character whichc an be UTF-8 encoded.

public static void main(String... args) throws UnsupportedEncodingException {
    char ch1 = 'a';
    char ch2 = 'æ';
    char ch3 = (char) (ch1 ^ ch2);
    System.out.println((int) ch3 + " UTF-8 encoded is " + Arrays.toString(String.valueOf(ch3).getBytes("UTF-8")));
}

prints

135 UTF-8 encoded is [-62, -121]

edited Feb 21 '14 at 11:13

answered Feb 26 '11 at 11:33

Peter Lawrey

525,659
79
751
1,130

I check for `i – Peter Lawrey Jul 26 '12 at 19:02
4

Firstly, the string produced is not properly xor'd in the sense that you cannot get your original string back by xor'ing it with the key again (unless your key was guaranteed to be equal to or longer than the messages which would be very strange) making the code completely misrepresent the concept of xor'ing. Secondly, you are not guaranteed to get valid string bytes by simply xoring characters, so your output string may contain invalid byte sequences. – user467257 Nov 12 '12 at 10:26
@user467257 That is true, I suggest you add an answer with how you would solve these issues. – Peter Lawrey Nov 12 '12 at 10:31
Your edit has made the code worse. Input s1 = XY, s2 = ABCD will now produce exactly the same output as input of s1 = XYXY, s2 = ABCD. so two different strings XY and XYXY would produce exactly the same when xor'd once against ABCD, meaning that double application will not give you your original s1 back if s1 shorter than s2, again breaking the concept of xor'ing. – user467257 Nov 12 '12 at 11:54
@user467257 fair point, I have made it asymmetric so the length is the length of the original string. This will work provided chars < 32768 are used. – Peter Lawrey Nov 12 '12 at 12:15
"This will work provided chars < 32768 are used" - do you really know what you are saying there ? As a simple example, if your input/output is utf-8 and you xor "a" and "æ", you are left with an invalid utf-8 string consisting of one character (decimal 135, a continuation character). If you were using your returned xor'd strings as part of a url for anchor tags for example, you would occasionally generate hrefs that did not work. – user467257 Nov 15 '12 at 15:54
1

@user467257 I think you are confusing `char` and `byte` which are not the same thing. I have updated my answer with a reply to your comment. – Peter Lawrey Nov 15 '12 at 16:16
1

I deleted my two comments because there were too many inaccuracies. I think "insertion" of the extra byte effectively happens at the point of casting to a char because the char will be pointing at the codepoint with the two byte utf-8 representation). I think I can come up with a better example of failure of char wise xoring though, will think about it over the weekend. – user467257 Nov 16 '12 at 09:35
@PeterLawrey Thank you for your answer. Wanted a simple solution how to XOR two strings in JAVA. Your code helped me to understand it properly. – Karthik N G Jul 11 '13 at 09:48
The concern of user467257 is actually true, just that he is using a counter example for UTF-8, while Java String uses UTF-16. For UTF-16, we just need to find 2 characters that xor together to produce surrogate. For example: `'\u11b0' ^ '\uc810'`. If you use `getBytes` on a string that has unpaired surrogate, it will produce `?` for UTF-8 and REPLACEMENT CHARACTER `\ufffd` for UTF-16. – nhahtdh Feb 12 '14 at 15:33
@nhahtdh Good point, xor'ing two characters together can produce a character which is not valid. There is limitations as to when XOR would work, but I am not sure it makes sense in the first place. – Peter Lawrey Feb 12 '14 at 17:04
Would you please edit your post to put up some sort of warning? – nhahtdh Feb 21 '14 at 09:19
2

@PeterLawrey There are only limitations when you xor char by char as your answer proposes. It is a hack solution, ready to trap the unwary. The better approach is to xor byte by byte, base64 (or other) encode the result to ensure printability/readabilty, then reverse those steps to decode. – user467257 Jun 20 '14 at 09:38

Paŭlo Ebermann · Answer 3 · 2011-02-26T22:04:42.887

Pay attention:

A Java char corresponds to a UTF-16 code unit, and in some cases two consecutive chars (a so-called surrogate pair) are needed for one real Unicode character (codepoint).

XORing two valid UTF-16 sequences (i.e. Java Strings char by char, or byte by byte after encoding to UTF-16) does not necessarily give you another valid UTF-16 string - you may have unpaired surrogates as a result. (It would still be a perfectly usable Java String, just the codepoint-concerning methods could get confused, and the ones that convert to other encodings for output and similar.)

The same is valid if you first convert your Strings to UTF-8 and then XOR these bytes - here you quite probably will end up with a byte sequence which is not valid UTF-8, if your Strings were not already both pure ASCII strings.

Even if you try to do it right and iterate over your two Strings by codepoint and try to XOR the codepoints, you can end up with codepoints outside the valid range (for example, U+FFFFF (plane 15) XOR U+10000 (plane 16) = U+1FFFFF (which would the last character of plane 31), way above the range of existing codepoints. And you could also end up this way with codepoints reserved for surrogates (= not valid ones).

If your strings only contain chars < 128, 256, 512, 1024, 2048, 4096, 8192, 16384, or 32768, then the (char-wise) XORed strings will be in the same range, and thus certainly not contain any surrogates. In the first two cases you could also encode your String as ASCII or Latin-1, respectively, and have the same XOR-result for the bytes. (You still can end up with control chars, which may be a problem for you.)

What I'm finally saying here: don't expect the result of encrypting Strings to be a valid string again - instead, simply store and transmit it as a byte[] (or a stream of bytes). (And yes, convert to UTF-8 before encrypting, and from UTF-8 after decrypting).

what Java is using internally is irrelevant. As a user you can either access each *char* (with surrogates issues of course) or each codepoint. Whether Java uses internally UTF-16 or the colors of the moonboots little fearies are wearing has nothing to do with the question. — SyntaxT3rr0r, Feb 26 '11 at 14:06
@SyntaxT3rr0r: Okay, maybe not optimally worded, I'm trying to edit this. — Paŭlo Ebermann, Feb 26 '11 at 21:41
@SyntaxT3rr0r: XORing by codepoint does not help, either (see example now in the answer). — Paŭlo Ebermann, Feb 26 '11 at 22:05
+1 - I agree with Paulo. XOR-ing is liable destroy the properties that make a Java String a valid UTF-16 String. If you do that, they become impossible to encode / decode. — Stephen C, Feb 26 '11 at 22:53

score 4 · Answer 4 · answered Jun 29 '17 at 11:02

This solution is compatible with Android (I've tested and used it myself). Thanks to @user467257 whose solution I adapted this from.

import android.util.Base64;

public class StringXORer {

public String encode(String s, String key) {
    return new String(Base64.encode(xorWithKey(s.getBytes(), key.getBytes()), Base64.DEFAULT));
}

public String decode(String s, String key) {
    return new String(xorWithKey(base64Decode(s), key.getBytes()));
}

private byte[] xorWithKey(byte[] a, byte[] key) {
    byte[] out = new byte[a.length];
    for (int i = 0; i < a.length; i++) {
        out[i] = (byte) (a[i] ^ key[i%key.length]);
    }
    return out;
}

private byte[] base64Decode(String s) {
    return Base64.decode(s,Base64.DEFAULT);
}

private String base64Encode(byte[] bytes) {
    return new String(Base64.encode(bytes,Base64.DEFAULT));

}
}

Thanks! A couple of notes: `base64Encode()` is not used anywhere, and better use `Base64.NO_WRAP` for encoding to avoid line breaks. — gmk57, Feb 21 '20 at 13:22

score 3 · Answer 5 · answered Dec 26 '12 at 16:02

3

This is the code I'm using:

private static byte[] xor(final byte[] input, final byte[] secret) {
    final byte[] output = new byte[input.length];
    if (secret.length == 0) {
        throw new IllegalArgumentException("empty security key");
    }
    int spos = 0;
    for (int pos = 0; pos < input.length; ++pos) {
        output[pos] = (byte) (input[pos] ^ secret[spos]);
        ++spos;
        if (spos >= secret.length) {
            spos = 0;
        }
    }
    return output;
}

answered Dec 26 '12 at 16:02

yegor256

102,010
123
446
597

hi can you explain to me please, how exactly should this work. – 5er Dec 06 '15 at 03:17
hi can you explain to me please, how exactly should this work. My thinking is like this: Create a "secret" 1. create encode string with code above and add it to source. 2. on runtime decode this encoded string. Every time I would use the same secret and the same algorithm. My question is where to hide secret, that potential hacker wount be able to get my public key – 5er Dec 06 '15 at 03:23

score 3 · Answer 6 · edited May 23 '17 at 11:47

3

Assuming (!) the strings are of equal length, why not convert the strings to byte arrays and then XOR the bytes. The resultant byte arrays may be of different lengths too depending on your encoding (e.g. UTF8 will expand to different byte lengths for different characters).

You should be careful to specify the character encoding to ensure consistent/reliable string/byte conversion.

edited May 23 '17 at 11:47

Community

1
1

answered Feb 26 '11 at 11:32

Brian Agnew

268,207
37
334
440

2

The strings could be of equal length but the byte arrays might be of different lengths. ;) – Peter Lawrey Feb 26 '11 at 11:33
@PeterLawrey Can you explain me when the byte arrays' length can differ? – artaxerxe Apr 17 '12 at 08:47
1

If you have `"$".getBytes()` it could be 1 byte, "£" could be 2 bytes and "€" could be 3 bytes. (They are in UTF-8) – Peter Lawrey Apr 17 '12 at 08:54
@PeterLawrey That means that any `char` with `int` representation > than 255 will be represented on more than 1 byte? (in UTF-8) – artaxerxe Apr 17 '12 at 09:03
Any char > 127 uses more than one byte in UTF-8. Some use two or three. Strings can contain code points (characters > 65535) and they can use 4 bytes. – Peter Lawrey Apr 17 '12 at 09:08
1

To clarify, code points in Java can be between 0 (Character.MIN_CODE_POINT) and 0x10FFFF (Character.MAX_CODE_POINT) – Peter Lawrey Apr 17 '12 at 09:52

score 2 · Answer 7 · edited Oct 14 '16 at 13:40

2

the abs function is when the Strings are not the same length so the legth of the result will be the same as the min lenght of the two String a and b

public String xor(String a, String b){
    StringBuilder sb = new StringBuilder();
    for(int k=0; k < a.length(); k++)
       sb.append((a.charAt(k) ^ b.charAt(k + (Math.abs(a.length() - b.length()))))) ;
       return sb.toString();
}

edited Oct 14 '16 at 13:40

alex

8,904
6
49
75

answered May 22 '14 at 15:45

user3514540

23
3

you don't really need to calculate abs in a loop. – alex Oct 14 '16 at 13:40

XOR operation with two strings in java

7 Answers7

Linked

Related