6

When running the following piece of code, the execution of the Java String's native method getBytes() seems to be slower than the custom getBytesFast() implementation. You can use the Arrays.equals(str.getBytes(), getBytesFast(str)) to verify that both byte arrays are equals.

The getBytesFast implementation is a modified version of the implementation included in this programming tips article (1997): http://java.sun.com/developer/technicalArticles/Programming/Performance/

I'm looking for a well documented answer on why the native implementation is slower than the custom implementation.

package com.test;

public class Performance {

    public static void main(String args[]) {

        final String str = "This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test! This is a performance test!";

        long startTime_1 = System.nanoTime();
        str.getBytes();
        System.out.println(System.nanoTime() - startTime_1);

        long startTime_2 = System.nanoTime();
        getBytesFast(str);
        System.out.println(System.nanoTime() - startTime_2);
    }

    private static byte[] getBytesFast(String str) {
        final char buffer[] = new char[str.length()];
        final int length = str.length();
        str.getChars(0, length, buffer, 0);
        final byte b[] = new byte[length];
        for (int j = 0; j < length; j++)
            b[j] = (byte) buffer[j];
        return b;
    }
}

EDITED:

Caliper benchmark results

enter image description here

Thanks

Apostolos Emmanouilidis
  • 7,179
  • 1
  • 24
  • 35
  • 3
    A single run execution is no proper way of measuring performance. Many external factors can alter the result. – SJuan76 Sep 02 '12 at 21:50
  • 4
    [*How do I write a correct micro-benchmark in Java?*](http://stackoverflow.com/questions/504103) – Tomasz Nurkiewicz Sep 02 '12 at 21:52
  • I was curious so I copied your code and ran it in Eclipse myself and it output 1 for both methods, so your test is certainly not conclusive. – rjsang Sep 02 '12 at 21:55
  • 2
    The reason is explained *in the article,* which incidentally is dated 1997, and also states that it does incorrect char-byte conversion. – user207421 Sep 02 '12 at 22:13
  • 1
    Your `getBytesFast` method would most likely fail if your test string included characters from outside the Latin-1 block. (For instance, try it with a Greek letter α ("\u03b1") in the string. – Ted Hopp Sep 02 '12 at 22:33
  • Thank you all for your valuable comments. – Apostolos Emmanouilidis Sep 02 '12 at 22:52
  • 4
    Often custom implementations are faster than the built in ones because a) you can or have made lots of assumptions the built in libraries do not or can not. b) some of the libraries were written along time ago and cannot be "improved" as that might change their behaviour. – Peter Lawrey Sep 03 '12 at 07:30
  • 1
    Just for the people who are curious about how this looks like in a caliper benchmark: http://microbenchmarks.appspot.com/run/fabian.barney@gmail.com/benchmark.caliper.StringGetBytes – Fabian Barney Sep 04 '12 at 11:10
  • The post has been updated with the caliper benchmark results, which FabianBarney posted. @FabianBarney: Thank you – Apostolos Emmanouilidis Sep 04 '12 at 13:02
  • The conclusion is not valid. What's right or wrong here is not about convertion between character encodings. It depends on what codepoints are used if your method produces the correct result or not. – Fabian Barney Sep 04 '12 at 14:27
  • @FabianBarney: I removed the word "conclusion", stating that this is my personal opinion. Thanks – Apostolos Emmanouilidis Sep 04 '12 at 14:59
  • It's on your opinion if it's eligible to use the faster version with its speed advantage but downsides or not. But the text you wrote about convertion between character encodings still remains simply wrong. This ALL here has nothing to do with convertion between char encodings. It's about byte-representation of Strings and Codepoints used and such. – Fabian Barney Sep 04 '12 at 16:11
  • @FabianBarney Every character can be represented differently according the used encoding scheme. That's why the getBytes(charSet) operation requires a “charsetName” in order to convert the String characters. So I wanted to say that if the encoding scheme is important then the String getBytes(charSet) and the NIO framework should be used. Maybe I didn't express it well. – Apostolos Emmanouilidis Sep 04 '12 at 17:08

3 Answers3

13

According to [the documentation], the char type in Java is a 16-bit Unicode character, whereas the byte type is an 8-bit signed integer. This means that with each char-to-byte cast made in your code you're throwing away half the character data.

The Java tutorial on Character and Byte Streams has a nice little example string using Japanese Kanji:

String jaString = new String("\u65e5\u672c\u8a9e\u6587\u5b57\u5217");

For each character in that string your fast conversion method would throw away the first byte of information (e.g. the 65 in \u65e5). Your link also specifically mentions that String.getBytes() runs several times more slowly "because the former does correct byte-to-char conversion, which involves a function call per character."

If you completely ignore character encoding and throw away the higher-order byte from each char then you'll get a bit of a speedup. You just need to keep in mind that this method only works with certain character encodings and has a potential for data loss.

DaoWen
  • 32,589
  • 6
  • 74
  • 101
12

String.getBytes takes into account the default charset of the system. Your implementation assumes ISO-8859-1.

String.getBytes eventually ends up calling this method. ce is a CharsetEncoder.

byte[] encode(char[] ca, int off, int len) {
        int en = scale(len, ce.maxBytesPerChar());
        byte[] ba = new byte[en];
        if (len == 0)
            return ba;
        if (ce instanceof ArrayEncoder) {
            int blen = ((ArrayEncoder)ce).encode(ca, off, len, ba);
            return safeTrim(ba, blen, cs, isTrusted);
        } else {
            ce.reset();
            ByteBuffer bb = ByteBuffer.wrap(ba);
            CharBuffer cb = CharBuffer.wrap(ca, off, len);
            try {
                CoderResult cr = ce.encode(cb, bb, true);
                if (!cr.isUnderflow())
                    cr.throwException();
                cr = ce.flush(bb);
                if (!cr.isUnderflow())
                    cr.throwException();
            } catch (CharacterCodingException x) {
                // Substitution is always enabled,
                // so this shouldn't happen
                throw new Error(x);
            }
            return safeTrim(ba, bb.position(), cs, isTrusted);
        }
    }
}

private static int scale(int len, float expansionFactor) {
    // We need to perform double, not float, arithmetic; otherwise
    // we lose low order bits when len is larger than 2**24.
    return (int)(len * (double)expansionFactor);
}

private static char[] safeTrim(char[] ca, int len,
                               Charset cs, boolean isTrusted) {
    if (len == ca.length && (isTrusted || System.getSecurityManager() == null))
        return ca;
    else
        return Arrays.copyOf(ca, len);
}

There is a much larger degree of complexity involved with using a CharsetEncoder, which could account for the slower execution times you are seeing.

Jeffrey
  • 44,417
  • 8
  • 90
  • 141
4

It may be because String.getBytes() uses or delegates to a Charset (the JVM's current default one) and your "fast" implementation is just hard-coded ISO-8859-1 charset.

(Note: i did not verify your results, i'm just stating my hipothesis here. Comments relating micro benchmarks are more than important here, and definitely more valuable than my answer to your question :)

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82