1

In java, what is the FASTEST way to convert a substring to an integer WITHOUT USING Integer.parseInt? I want to know if there is a way to avoid parseInt because it requires I make a temporary string that is a copy of the substring I want converted.

"abcd12345abcd"  <-- just want chars 4..8 converted.

I would like to avoid making a new temp string by not using substring.

If I were to roll my own, is there a way to avoid the overhead of the array bounds checking i see inside String.charAt(int)?

EDIT

I got a lot of good information from everyone...and the usual warnings about pre-optimization :) The basic answer is that there is nothing better than String.charAt or char[]. Unsafe code is on the way out (maybe). It is likely that the compiler can optimize away excessive range checking on [].

I did some benchmarking, and the savings due to not using substring and rolling a specific parseInt are huge.

32% of the cost of calling Integer.parseInt(str.substring(4,8)) comes from the substring. this does not include subsequent garbage collection costs.

Integer.parseInt is designed to handle a very wide set of inputs. By rolling my own parseInt (specific to what our data looks like) using charAt, I was able to achieve a 6x speedup over the substring method.

The comment to try char[] lead to a performance increase of about 7x. However your data must already be in a char[] as the cost to convert to a char array is high. For parsing text, it seems like it makes sense to stay entirely within char[] and write a few functions to compare strings.

Benchmark results (smaller is faster):

parseInt(substring)  23731665
parseInt(string)     16859226
Atoi1                 7116633
Atoi2                 4514031
Atoi3 char[]          4135355
Atoi4 char[]          3503638
Atoi5 char[]          5485495
GetNumber1            8666020
GetNumber2            5951939

During benchmarking, I also experimented with Inline on and off and verified that the compiler was properly inlining everything.

Here is my benchmarking code if anyone cares...

package javaatoi;

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;

public class JavaAtoi {

    static int cPasses = 10;
    static int cTests = 9;
    static int cIter = 0x100000;
    static int cString = 0x100;
    static int fStringMask = cString - 1;

    public static void main(String[] args) throws InterruptedException {

        // setup test data.  Use a large enough set that the compiler 
        // wont unroll the loop.  Use a small enough set that we are 
        // keeping the data in L2.  I don't want to measure memory loads.

        String[] a = new String[cString];
        for (int i = 0 ; i< cString ; i+=4) {
            // leading zeros will occur, so add one number with one.
            a[i+0] = "abcd01234abcd";
            a[i+1] = "abcd1234abcd";
            a[i+2] = "abcd1234abcd";
            a[i+3] = "abcd1234abcd";
        }

        // array of pre-substringed stuff
        String[] a1 = new String[cString];
        for (int i=0 ; i< cString ; ++i)
            a1[i]= a[i].substring(4,8);

        // char array version of the strings
        char[][] b = new char[cString][];
        for (int i =0 ; i<cString ; ++i)
            b[i] = a[i].toCharArray();

        // array to hold times for each test for each pass
        long[][] t = new long[cPasses][cTests];

        // multiple dry runs to let the compiler optimize the functions
        for (int i=0 ; i<50 ; ++i) {
          t[0][0] = TestParseInt1(a)[0];
          t[0][1] = TestParseInt2(a1)[0];
          t[0][2] = TestAtoi1(a)[0];
          t[0][3] = TestAtoi2(a)[0];
          t[0][4] = TestAtoi3(b)[0];
          t[0][5] = TestAtoi4(b)[0];
          t[0][6] = TestAtoi5(b)[0];
          t[0][7] = TestAtoi6(a)[0];
          t[0][8] = TestAtoi7(a)[0];
        }

        // now do a bunch of tests
        for (int i=0 ; i<cPasses ; ++i) {
            t[i][0] = TestParseInt1(a)[0];
            t[i][1] = TestParseInt2(a1)[0];
            t[i][2] = TestAtoi1(a)[0];
            t[i][3] = TestAtoi2(a)[0];
            t[i][4] = TestAtoi3(b)[0];
            t[i][5] = TestAtoi4(b)[0];
            t[i][6] = TestAtoi5(b)[0];
            t[i][7] = TestAtoi6(a)[0];
            t[i][8] = TestAtoi7(a)[0];
        }

        // setup mins - we only care about min time.
        t[cPasses-1] = new long[cTests];
        for (int i=0 ; i<cTests ; ++i)
            t[cPasses-1][i] = 999999999;
        for (int j=0 ; j<cTests ; ++j) {
            for (int i=0 ; i<cPasses-1 ; ++i) {
                long n = t[i][j];
                if (n < t[cPasses-1][j])
                    t[cPasses-1][j] = n;
            }
        }

        // output string
        String s = new String();
        for (int j=0 ; j<cTests ; ++j) {
            for (int i=0 ; i<cPasses ; ++i) {
                long n = t[i][j];
                s += String.format("%9d", n);
            }
            s += "\n";
        }
        System.out.println(s);

        // if you comment out the part of TestParseInt1 you can sorta see the 
        // gc cost.
        System.gc(); // Trying to get an idea of the total substring cost
        Thread.sleep(1000);  // i dunno if this matters.  Seems like the gc takes a little while.  Not real exact...

        long collectionTime = 0;
        for (GarbageCollectorMXBean garbageCollectorMXBean : ManagementFactory.getGarbageCollectorMXBeans()) {
            long n = garbageCollectorMXBean.getCollectionTime();
            if (n > 0) 
                collectionTime += n;
        }

        System.out.println(collectionTime*1000000);
    }

   // you have to put each test function in its own wrapper to 
   // get the compiler to fairly optimize each test.
   // I also made sure I incremented n and used a large # of string
   // to make it harder for the compiler to eliminate the loops.

    static long[] TestParseInt1(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        // comment this out to get an idea of gc cost without the substrings
        // then uncomment to get idea of gc cost with substrings
        for (int i=0 ; i<cIter ; ++i) 
            n += Integer.parseInt(a[i&fStringMask].substring(4,8));
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestParseInt2(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Integer.parseInt(a[i&fStringMask]);
        return new long[] { System.nanoTime() - startTime, n };
    }


    static long[] TestAtoi1(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi1(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi2(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi2(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi3(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi3(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi4(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi4(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi5(char[][] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi5(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi6(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi6(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static long[] TestAtoi7(String[] a) {
        long n = 0;
        long startTime = System.nanoTime();
        for (int i=0 ; i<cIter ; ++i) 
            n += Atoi7(a[i&fStringMask], 4, 4);
        return new long[] { System.nanoTime() - startTime, n };
    }

    static int Atoi1(String s, int i0, int cb) {
        int n = 0;
        boolean fNeg = false;   // for unsigned T, this assignment is removed by the optimizer
        int i = i0;
        int i1 = i + cb;
        int ch;
        // skip leading crap, scan for -
        for ( ; i<i1 && ((ch = s.charAt(i)) > '9' || ch <= '0') ; ++i) {
            if (ch == '-') 
                fNeg = !fNeg;
        }
        // here is the loop to process the valid number chars.
        for ( ; i<i1 ; ++i) 
            n = n*10 + (s.charAt(i) - '0'); 
        return (fNeg) ? -n : n;
    }

    static int Atoi2(String s, int i0, int cb) {
        int n = 0;
        for (int i=i0 ; i<i0+cb ; ++i) {
            char ch = s.charAt(i);
            n = n*10 + ((ch <= '0') ? 0 : ch - '0');
        }
        return n;
    }

    static int Atoi3(char[] s, int i0, int cb) {
        int n = 0, i = i0, i1 = i + cb;
        // skip leading spaces or zeros
        for ( ; i<i1 && s[i] <= '0' ; ++i) { }
        // loop to process the valid number chars.
        for ( ; i<i1 ; ++i) 
            n = n*10 + (s[i] - '0');
        return n;
    }   

    static int Atoi4(char[] s, int i0, int cb) {
        int n = 0;
        // loop to process the valid number chars.
        for (int i=i0 ; i<i0+cb ; ++i) {
            char ch = s[i];
            n = n*10 + ((ch <= '0') ? 0 : ch - '0');
        }
        return n;
    }   

    static int Atoi5(char[] s, int i0, int cb) {
        int ch, n = 0, i = i0, i1 = i + cb;
        // skip leading crap or zeros
        for ( ; i<i1 && ((ch = s[i]) <= '0' || ch > '9') ; ++i) { }
        // loop to process the valid number chars.
        for ( ; i<i1 && (ch = s[i] - '0') >= 0 && ch <= 9 ; ++i) 
            n = n*10 + ch;
        return n;
    }   

    static int Atoi6(String data, int start, int length) {
        int number = 0;
        for (int i = start; i <= start + length; i++) {
            if (Character.isDigit(data.charAt(i))) {
                number = (number * 10) + (data.charAt(i) - 48);
            }
        }       
        return number;
    }

    static int Atoi7(String data, int start, int length) {
        int number = 0;
        for (int i = start; i <= start + length; i++) {
            char ch = data.charAt(i);
            if (ch >= '0' && ch <= '9') {
                number = (number * 10) + (ch - 48);
            }
        }       
        return number;
    }

}
johnnycrash
  • 5,184
  • 5
  • 34
  • 58
  • 2
    So if the string were "ABC123DEF456", would the resulting integer be 123 or 123456? – Chris Forrence Jul 15 '15 at 20:49
  • @BrandonLing: Not entirely; you have to get rid of the non-numeric characters first. – Makoto Jul 15 '15 at 20:49
  • 1
    Does the numeric part _always_ start at index 4? – Sean Bright Jul 15 '15 at 20:49
  • 1
    The substring will always be digits. We know the position and length in the string at compile time too. – johnnycrash Jul 15 '15 at 20:50
  • @Makoto i see i was a little confused by the question – Brandon Ling Jul 15 '15 at 20:50
  • Wait. If you already *have* the substring, what's stopping you from using `Integer#parseInt`? – Makoto Jul 15 '15 at 20:50
  • Sorry! I'm a c++ guy. I don't know if there are secret string functions or classes that do what I want. I also want to avoid making temp strings if I can. – johnnycrash Jul 15 '15 at 20:51
  • Also I know a roll your own version will have to use charAt and I was wondering if that gets optimized by the compiler or if you suffer the bounds check on every call. – johnnycrash Jul 15 '15 at 20:51
  • @makoto see that's what im saying.. why not use parseInt – Brandon Ling Jul 15 '15 at 20:51
  • 3
    String is immutable; any operation done on a String creates a new one. Unless you really feel like dealing with arrays, the overhead from creating a new String is so minuscule, there's no reason to try to code around it until we know it's a huge problem. – Makoto Jul 15 '15 at 20:52
  • I edited the question to say I don't want to use parseInt since it requires a temp string. – johnnycrash Jul 15 '15 at 20:53
  • I really want to avoid all the talk about optimization being a bad thing and just focus on the question. – johnnycrash Jul 15 '15 at 20:54
  • You don't have a question - you don't want to use `substring` and you don't want to use `parseInt` so you will have to walk the array and build the number yourself. – Sean Bright Jul 15 '15 at 20:55
  • Ok. Part of my question was if I roll my own, what is the best way to avoid the bounds checking performed by charAt. – johnnycrash Jul 15 '15 at 20:56
  • You can't. Roll your own JRE? Don't put the data in a `String` in the first place? – Sean Bright Jul 15 '15 at 20:58
  • I don't want to roll my own JRE! What about charsequence? – johnnycrash Jul 15 '15 at 20:59
  • How does this data get into your `String` in the first place? Could you put it in a `char[]` when it enters your program instead of a `String`? – Sean Bright Jul 15 '15 at 21:01
  • I think we could. They are parsing a large legacy message block. Would char array be better than a string? – johnnycrash Jul 15 '15 at 21:07
  • A string is essentially a char array – Shar1er80 Jul 15 '15 at 21:08
  • @Sean Thank you for your comment. I politely asked to table all the comments about the problems with performance optimization. – johnnycrash Jul 16 '15 at 01:24
  • @Sean. I don't want to piss you off. I liked a lot of your comments. I am not trying to save memory, just very important CPU cycles. It's not for glory. Its actually a pain in the ass for no glory. – johnnycrash Jul 16 '15 at 01:50
  • @johnnycrash You aren't pissing me off, you're just wasting your time and it's frustrating that no one can effectively communicate that to you. When you do figure out... whatever it is you are trying to figure out, please come back and add your own answer to your question. – Sean Bright Jul 16 '15 at 12:12
  • @Sean I've got a solution now that is 7 times faster than using parseInt with substring. I found everything everyone said in here useful - the answers and the comments. I am grateful that people offered possible answers even if they were imperfect or non standard. I needed to know everything everyone could think of and then see the comments. This was great! I learned that there really isn't anything better than charAt or using a char []. Your comments were all quite helpful even the ones with colorful commentary on the worth of my endeavors. :) – johnnycrash Jul 16 '15 at 16:01
  • @johnnycrash You should move your "update" into an answer and mark it as accepted. There is nothing wrong with answering your own question and it is preferred to leaving a question as unanswered. Heck, I'd even upvote it for the sheer persistence. – Sean Bright Jul 16 '15 at 20:15

4 Answers4

2

UPDATE

Seeing that you're wanting to mimic C/C++ behavior in Java, and after doing some googling around, I came across http://ssw.jku.at/Research/Papers/Wuerthinger07/ that may interest you.

Array Bounds Check Elimination for the Java HotSpot™ Client Compiler Abstract

Whenever an array element is accessed, Java virtual machines execute a compare instruction to ensure that the index value is within the valid bounds. This reduces the execution speed of Java programs. Array bounds check elimination identifies situations in which such checks are redundant and can be removed. We present an array bounds check elimination algorithm for the Java HotSpot™ VM based on static analysis in the just-in-time compiler.

The algorithm works on an intermediate representation in static single assignment form and maintains conditions for index expressions. It fully removes bounds checks if it can be proven that they never fail. Whenever possible, it moves bounds checks out of loops. The static number of checks remains the same, but a check inside a loop is likely to be executed more often. If such a check fails, the executing program falls back to interpreted mode, avoiding the problem that an exception is thrown at the wrong place.

The evaluation shows a speedup near to the theoretical maximum for the scientific SciMark benchmark suite (40% on average). The algorithm also improves the execution speed for the SPECjvm98 benchmark suite (2% on average, 12% maximum).

Full research paper found here http://www.ssw.uni-linz.ac.at/Research/Papers/Wuerthinger07/Wuerthinger07.pdf

OLD ANSWER 2

Since you know the start and length of the digits in the string you can still "roll your own" without bounds checking. Either way, you're going to have to do some kind of extraction to get the number. Whether you extract into a temporary string then convert it, or convert the characters on the fly.

public static void main(String[] args) throws Exception {
    String data = "abcd12345abcd";
    System.out.println(getNumber(data, 4, 5));
}

public static int getNumber(String data, int start, int length)
{
    int number = 0;
    for (int i = start; i <= start + length; i++) {
        char c = data.charAt(i);
        if ('0' <= c && c <= '9') {
            number = (number * 10) + (c - 48);
        }
    }
    return number;
}

Results:

12345

OLD ANSWER 1

Remove what you don't want with String.replaceAll() and then convert/parse what's left.

public static void main(String[] args) throws Exception {
    String data = "abcd12345abcd";

    int myInt = Integer.valueOf(data.replaceAll("[^0-9]", ""));
    System.out.println(myInt);
}

Results:

12345
Shar1er80
  • 9,001
  • 2
  • 20
  • 29
  • He doesn't want to use `String.substring` or `Integer.parseInt` - so I can't imagine a regular expression is what he is after. – Sean Bright Jul 15 '15 at 20:54
  • That is one way to do it. I was looking for a method that was faster than making a temporary substring and passing it to parseInt. This makes a temp string and does more work to boot. Thanks though! – johnnycrash Jul 15 '15 at 20:55
  • If you're going to down vote, at lease comment why you're down voting!!! Updated answer is not using substring() or parseInt(). – Shar1er80 Jul 15 '15 at 21:14
  • Ah, sorry. I downvoted because OP is suggesting that a bounds check on an array index is too slow, so a regular expression is going to be orders of magnitude slower. `charAt` is too slow as well for his needs, which your updated answer is suggesting. – Sean Bright Jul 15 '15 at 21:25
  • @SeanBright See updated answer. Not iterating through the entire string, but extracting the number. As the OP comments, "The substring will always be digits. We know the position and length in the string at compile time too" – Shar1er80 Jul 15 '15 at 21:44
  • I thought charAt performed a bounds check? I was wondering if there was a way java developers worked around that. – johnnycrash Jul 16 '15 at 01:58
  • @johnnycrash I've added some references to my answer. In short, charAt() does not perform bounds checking. – Shar1er80 Jul 16 '15 at 02:09
  • The code for charAt does this: if ((index < 0) || (index >= value.length)) throw new StringIndexOutOfBoundsException(index); Which is what I meant by bounds check. – johnnycrash Jul 16 '15 at 03:53
  • `charAt` absolutely does bounds checking. The internal `char[]` inside the `String` will not necessarily be the same length as the `String` itself, so without a bounds check a call to `charAt(32)` on a `String` where `length()` returns `8` could potentially succeed. – Sean Bright Jul 16 '15 at 12:11
  • @johnnycrash Found some information that you might be interested in. Added the information to my answer. – Shar1er80 Jul 16 '15 at 15:29
  • It's going to be tough to pick the answer to my question. You provided parts of it by uncovering possible compiler optimizations. By the way, I included your getNumber function in my benchmarking. It is about 3x faster than parseInt. It becomes 4x faster if you don't call charAt twice and you check >= '0' <= '9' instead of calling isDigit.. – johnnycrash Jul 16 '15 at 15:58
2

Sorry...there's really no way to accomplish what you want to do without either:

  • Creating an intermediate String, or
  • Creating some other intermediate objects in lieu of the String to then be parsed into an int.

Java isn't like C++; a String isn't the same as a char[].

As I mentioned before, any operations done on a String that return String produce a new String instance, so inevitably, you will be dealing with Strings in an intermediate fashion.

The main issue here is that, if you actually know the substring bounds, then use them to accomplish what it is you need to.

Do not worry about optimization until you can reason that this portion of your code is the largest bottleneck. Even then, stick to optimizations that make sense; you could turn the entire String into an IntStream and only parse elements that were actual numbers in Java 8.

Chances are that this code won't be a major performance hit, and prematurely optimizing it is going to lead you down a very, very painful path.

Realistically speaking, the closest you could get (with Java 8's Stream API) is to do a few conversions between Character and String, but this still creates intermediate Strings:

System.out.println(Integer.parseInt("abcd12345abcd".chars()
                                                   .filter(Character::isDigit)
                                                   .mapToObj(c -> (char) c)
                                                   .map(Object::toString)
                                                   .reduce("", String::concat)));

...which is far uglier to read and understand than this:

System.out.println(Integer.parseInt("abcd12345abcd".substring(4, 9)));
Makoto
  • 104,088
  • 27
  • 192
  • 230
  • No, you're *quite* wrong about that. `parseInt` will **blow up** if there are characters that aren't within its base value present in the string. – Makoto Jul 15 '15 at 21:30
  • Well, there is a first time for everything I suppose – Sean Bright Jul 15 '15 at 21:32
  • In both your solutions, are you still using temporary strings which the OP is trying to avoid? – Shar1er80 Jul 15 '15 at 21:41
  • 1
    @Shar1er80: I swear... ***YES.*** That's the point. It's pedantic, ineffective, and inefficient to try to write it any other way. Java isn't like C++; there isn't really any incentive to try and optimize this until you really know that this is the bottleneck. – Makoto Jul 15 '15 at 21:42
  • Agreed, it's just frustrating to be down voted after giving multiple potential answers that acquire the desired result. – Shar1er80 Jul 15 '15 at 21:53
  • @Shar1er80 neither of your potential answers satisfy the stated requirements. I don't understand your confusion. – Sean Bright Jul 15 '15 at 22:04
  • @SeanBright So how does anyones answer satisfy the stated requirements? Each of us illustrate ways of extracting the number out of the String. – Shar1er80 Jul 15 '15 at 22:19
  • 1
    @Shar1er80: I basically declare that it's a fools' errand in the answer, so that differentiates it a bit. There's really no point in going this pedantic in this; sometimes the answer is "no". – Makoto Jul 15 '15 at 22:22
  • I like these oddball solutions even if they are over the top. Genius ideas come from left field. Yeah the immutable string feature is why i didn't want to use substring... I wanted to avoid the cost of creating a new string. So thank you! I consider your answer to be part of the answer to my question, which is "no there is nothing better". However other people gave parts of the answer too. I wish I could use more than one check box. – johnnycrash Jul 16 '15 at 16:08
  • "Do not worry about optimization until you can reason that this portion of your code is the largest bottleneck." - I have several thoughts on this. #1 is for a task like this that is very common, we need our developers to use the optimal solution when they originally write the code. It takes no longer to use a version of parseInt that takes a char range. #2 in simple code, there might be pinpoint bottlenecks. In complex code like ours, there is a component of the bottleneck that is evenly spread throughout the code by using bad techniques in a hundred thousand places. – johnnycrash Jul 16 '15 at 16:17
0

Please keep in mind that this would not be how I would normally approach this problem (opting to use a regular expression to filter out non-digits). However, the below solution does not create a separate String (aside from an array of chars).


public static int getIntegerFromString(String s) {
    int multiplier, result = 0;
    boolean inIntegers = false, beforeInteger = true;
    char[] chars = s.toCharArray();
    char c;

    // Iterate through each character, starting at the end
    for(int i = chars.length - 1; i >= 0; i--) {
        c = chars[i];
        if(Character.isDigit(c)) {

            // The char is a digit, so we either increase the multiplier (if the previous char was also a digit) or prepare our environment
            if(inIntegers) {
                multiplier *= 10;
            }
            else {
                inIntegers = true;
                beforeInteger = false;
                multiplier = 1;
            }

            result += multiplier * Character.getNumericValue(c);
        }
        else if(inIntegers) {
            // We're done with the sequence of integers. Stop the for-loop.
            break;
        }
    }

    return result;
}

[chris@localhost:Projects]$ java Test 3949
3949
[chris@localhost:Projects]$ java Test 3949G
3949
[chris@localhost:Projects]$ java Test E3949G
3949
Chris Forrence
  • 10,042
  • 11
  • 48
  • 64
  • `toCharArray()` does an array copy which I assume would be slower than just using `charAt()`. – Sean Bright Jul 15 '15 at 21:27
  • Yes. the basic question was is there an analogue of parseInt that can operate on a range of characters in a string in situ. this technically does not create a temp string, but it does call toCharArray which is same thing. However I am very thankful for the attempt! – johnnycrash Jul 16 '15 at 01:37
-2

You might try to have a look at sun.misc.Unsafe. I have actually never used it, but if you want to avoid boundary checks and so on, it might be possible to do that using this (undocumented) class.

see https://stackoverflow.com/questions/5574241/how-can-sun-misc-unsafe-be-used-in-the-real-world

EDIT: On the removal of Unsafe in Java 9 (the author is on the opinion that since many libraries use it it is not a good idea to remove it): http://blog.dripstat.com/removal-of-sun-misc-unsafe-a-disaster-in-the-making/

It is also possible to use JNI, but I guess calling it for trivial methods will result in massive overhead (if already boundary checks are defined as overhead)

see What makes JNI calls slow?

following link might also be interesting, the author also says that methods which are called frequently but have low run time are hard to optimize: https://thinkingandcomputing.com/2014/03/30/eliminating-jni-overhead/

You can get hold of Unsafe in the following way:

    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    Unsafe unsafe = (Unsafe) f.get(null);

for details, see: http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/

example for unsafe array:

    int[] x = new int[]{1,2,3,4};
    final int offset = unsafe.arrayBaseOffset(int[].class);
    final int arrayIndexScale = unsafe.arrayIndexScale(int[].class);
    for (int i=0;i<4;i++){
        unsafe.putInt(x, offset+arrayIndexScale*i, 11*(i+1));
    }
    System.out.println(Arrays.toString(x));
  Output: [11, 22, 33, 44]
Community
  • 1
  • 1
user140547
  • 7,750
  • 3
  • 28
  • 80
  • You're really waving your hand about here. Why would you think `Unsafe` would be a good thing to use? – Makoto Jul 15 '15 at 21:33
  • Not to mention that `Unsafe` is getting killed in Java 9, so this isn't forward compatible. – Sean Bright Jul 15 '15 at 21:34
  • Well OP's question assumed creating extra String objects or boundary checks are overhead, so this may be a possibilty to avoid that. Then again, one could use C as well. I don't say it is a good idea, as it is obviously against Java's design principles, premature optimization etc etc. but as "an C++ guy" he hopefully knows what he does. – user140547 Jul 15 '15 at 21:45
  • I run into a ton of motive and sanity questioning when I ask performance questions. – johnnycrash Jul 16 '15 at 01:35
  • +1 because this answer is helpful. I learned something. However, if this is being dropped in future java then I can't use it. What about using native and writing a c function to do it? – johnnycrash Jul 16 '15 at 01:41
  • updated my answer - I guess calling C methods using JNI won't help you if boundary checks are already considered unnecessary overhead. – user140547 Jul 16 '15 at 07:26
  • It might be that the entire parser (not just atoi) could be implemented in c and called from java. That would not have overhead issues. So you have given me ideas. Its nice to know that there is some unsafe stuff i can research more. Unfortunately it's going away, but it will lead to other topics if i research it more. – johnnycrash Jul 16 '15 at 15:54
  • Dangit, I consider this part of the answer but the answer is spread out over more than one person's post! – johnnycrash Jul 16 '15 at 16:03
  • Who knows if Unsafe will really be removed, after all, many libraries use it. Sun and Oracle have so far gone to great lengths to preserve backwards compatibility , so removing (although this class has never been public API) it would mean abandoning a policy which they have followed for now already 20 years. – user140547 Jul 16 '15 at 16:19
  • Where can I get Unsafe? I'm using netbeans. I don't seem to have it. – johnnycrash Jul 16 '15 at 17:38
  • Haha one last question. I almost have an unsafe implementation, but I cannot get the address of the first element of an array. – johnnycrash Jul 16 '15 at 18:36
  • I added an example. But be aware that I am not an expert, I also used it for the first time. – user140547 Jul 16 '15 at 21:08