5

I need to compare a String(which is a valid integer) with an int value.

For String str, int integer, my options are :

  1. Integer.parseInt(str) == integer

  2. str.equals(Integer.toString(integer))

Which one is faster and is there any better way?

Following is inspired by EqualsIntString of atlaste's answer

 private static boolean checkEquality(final String string, final long value) {

    if (string == null) {
        return false;
    }
    final int length = string.length();
    if (length == 0) {
        return false;
    }

    long absValue;
    final byte minIndex;
    if (string.charAt(0) == '-') {
        if (value > 0) {
            return false;
        }
        absValue = -value;
        minIndex = 1;
    } else {
        if (value < 0) {
            return false;
        }
        absValue = value;
        if (string.charAt(0) == '+') {
            minIndex = 1;
        } else {
            minIndex = 0;
        }
    }

    for (int idx = length - 1; idx >= minIndex; idx--) {
        final byte rem = (byte) (absValue % 10);
        absValue /= 10;
        final int diff = string.charAt(idx) - rem - 48;
        if (diff != 0) {
            return false;
        }
    }

    return absValue == 0;
}
lab bhattacharjee
  • 1,617
  • 2
  • 17
  • 33

4 Answers4

5

Only one way to know: measure.

A quick JMH benchmark shows that Integer.parseInt is quicker, regardless of the magnitude of the integer. But we are talking about 10 nanoseconds or so, and it's unlikely to make much difference at your application level.

If you are 100% sure that your string is a valid integer, you can also parse it manually, which will be even faster (see the parseManual and the equalsIntString methods in the code below). Which method to use also depends on whether you expect the values to be often equal or often different (if they are often equal, parseManual works better, if you expect them to be different on average, equalsIntString is faster).

So the characteristics of your dataset also matters in the decision.

Full results (score is in nanoseconds per operations: smaller is better) - the (n) column is the integer that is being compared. The first part copmares equal values and the second part compares unequal values.

Benchmark                                       (n)  Mode  Samples   Score   Error  Units
c.a.p.SO30507506.manual                           1  avgt       10   6.579 ± 0.131  ns/op
c.a.p.SO30507506.manual                       12345  avgt       10  10.017 ± 0.401  ns/op
c.a.p.SO30507506.manual                   123456789  avgt       10  12.490 ± 0.243  ns/op
c.a.p.SO30507506.manualAtlaste                    1  avgt       10   7.914 ± 0.144  ns/op
c.a.p.SO30507506.manualAtlaste                12345  avgt       10  15.902 ± 0.593  ns/op
c.a.p.SO30507506.manualAtlaste            123456789  avgt       10  28.117 ± 0.463  ns/op
c.a.p.SO30507506.parse                            1  avgt       10   8.495 ± 0.325  ns/op
c.a.p.SO30507506.parse                        12345  avgt       10  21.614 ± 0.564  ns/op
c.a.p.SO30507506.parse                    123456789  avgt       10  34.692 ± 0.572  ns/op
c.a.p.SO30507506.stringEquals                     1  avgt       10  21.597 ± 0.594  ns/op
c.a.p.SO30507506.stringEquals                 12345  avgt       10  36.948 ± 1.144  ns/op
c.a.p.SO30507506.stringEquals             123456789  avgt       10  44.444 ± 1.011  ns/op

c.a.p.SO30507506.manual_unequal                   1  avgt       10   7.011 ± 0.149  ns/op
c.a.p.SO30507506.manual_unequal               12345  avgt       10  10.244 ± 0.350  ns/op
c.a.p.SO30507506.manual_unequal           123456789  avgt       10  13.135 ± 0.797  ns/op
c.a.p.SO30507506.manualAtlaste_unequal            1  avgt       10   4.328 ± 0.111  ns/op
c.a.p.SO30507506.manualAtlaste_unequal        12345  avgt       10   4.359 ± 0.115  ns/op
c.a.p.SO30507506.manualAtlaste_unequal    123456789  avgt       10   4.339 ± 0.103  ns/op
c.a.p.SO30507506.parse_unequal                    1  avgt       10   8.304 ± 0.251  ns/op
c.a.p.SO30507506.parse_unequal                12345  avgt       10  21.514 ± 0.405  ns/op
c.a.p.SO30507506.parse_unequal            123456789  avgt       10  35.257 ± 1.043  ns/op
c.a.p.SO30507506.stringEquals_unequal             1  avgt       10  19.060 ± 0.162  ns/op
c.a.p.SO30507506.stringEquals_unequal         12345  avgt       10  31.829 ± 0.427  ns/op
c.a.p.SO30507506.stringEquals_unequal     123456789  avgt       10  41.870 ± 0.252  ns/op

Code:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class SO30507506 {

  @Param({"1", "12345", "123456789"}) int n;
  int i;
  String s;

  @Setup public void setup() {
    i = n;
    s = String.valueOf(n);
  }

  @Benchmark public boolean parse() {
    return Integer.parseInt(s) == i;
  }

  @Benchmark public boolean stringEquals() {
    return s.equals(Integer.toString(i));
  }

  @Benchmark public boolean manual() {
    return parseManual(s) == i;
  }

  @Benchmark public boolean manualAtlaste() {
    return equalsIntString(i, s);
  }

  @Benchmark public boolean parse_unequal() {
    return Integer.parseInt(s) == i * 2;
  }

  @Benchmark public boolean stringEquals_unequal() {
    return s.equals(Integer.toString(i * 2));
  }

  @Benchmark public boolean manual_unequal() {
    return parseManual(s) == i * 2;
  }

  @Benchmark public boolean manualAtlaste_unequal() {
    return equalsIntString(i * 2, s);
  }

  private static int parseManual(String s) {
    int result = 0;
    int sign = s.charAt(0) == '-' ? -1 : 1;
    int startIndex = (s.charAt(0) >= '0' && s.charAt(0) <= '9') ? 0 : 1;
    for (int i = startIndex; i < s.length(); i++) {
      result *= 10;
      result += s.charAt(i) - '0';
    }
    return result * sign;
  }

  private static boolean equalsIntString(int value, String s) {
    if (s.isEmpty()) return false; // This is never good.
    if ((s.charAt(0) == '-' && value >= 0) || (s.charAt(0) != '-' && value < 0)) return false; // positive/negative check

    // Define the limit. This is basically the end of the string to check.
    int limit = 0;
    if (value < 0) {
      limit = 1;
      value = -value;
    }

    for (int i = s.length() - 1; i >= limit; --i) {
      char expected = (char) ('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
      value /= 10; // same story.

      if (s.charAt(i) != expected) return false;
    }
    return true;
  }
}
assylias
  • 321,522
  • 82
  • 660
  • 783
  • 1
    Damn, are you fast. The speed ratio is much smaller than I'd expect, but at least the direction is right. The internationalization slows the parsing down a lot. – maaartinus May 28 '15 at 13:28
  • @assylias I'm actually unsure if the JIT is smart enough to just optimize this completely away. After all, you're working with strings that are statically compiled in a hotspot; I've seen the Java JIT doing amazing things with those. – atlaste May 28 '15 at 14:28
  • @atlaste The string `s` is unknown at compile time - it is instantiated in the `setup` method at runtime from `n`, which itself is only known at runtime and "injected" by JMH using the annotations. I don't think the optimisation you mention is possible in this code (and the results tend to prove it: the larger the number, the longer the operation). – assylias May 28 '15 at 14:35
  • @assylias OK, fair enough. Also, you only check equality; non-equality is probably faster to calculate for (longer)strings. – atlaste May 28 '15 at 14:39
1

Interesting question. For clarity, there's a lot you can do wrong with testing this, because Java does stuff with strings that might influence the results. So let's start with building a proper test.

Constructing your test

Specifically: a proper test doesn't rely on the loadstring, because that influences memory allocation. You want to make a test using dynamically constructed strings.

The 10-log of your integer (e.g. the length of the string) will influence the test outcome. The longer the string, the longer Integer.tryParse will take. If you have a longer string, it will need to calculate more div/mul and take longer. An additional thing that influences performance is the '-' sign. If you have unsigned integers, this should be taken into account.

Basically measuring means:

  • Create strings with the proper length (depends on your data!!!). More strings = better
  • Create fail/pass an integer array that matches (or doesn't) with the string array.
  • Garbage collect.
  • Test with those two arrays.

Be sure to make a huge array for this during the test, so that your tests won't be influenced. Also make sure that the integers / random numbers that you use have the same characteristics as your data... Because of this, I cannot execute the tests for you, so I'll just stick to the theory.

String to integer equality

It helps to know how string to integer conversion works, so let's start with a blunt solution and work our way up. I currently don't have Java on my laptop, so I'm sorry for the C# syntax :-) You should be easily able to fix it though...

public int ConvertStringToInt(string s)
{
    int val = 0;
    if (s[0] == '-') // 1
    {
        for (int i = 1; i < s.Length; ++i )
        {
            if (s[i] >= '0' && s[i] <= '9') // 2
            {
                throw new Exception();
            }
            val = val * 10 + s[i] - '0';
        }
        return -val;
    }
    else
    {
        for (int i = 0; i < s.Length; ++i)
        {
            if (s[i] >= '0' && s[i] <= '9')
            {
                throw new Exception();
            }
            val = val * 10 + s[i] - '0';
        }
        return val;
    }
}

If you know for a fact that the numbers in the string are never negative, you can of course drop the condition 1. Also, if you know for a fact that the string is always a number (which is IMO implied in your question), you can optimize 2. A little trick that I usually use is to use arithmetic overflows to generate large unsigned numbers, which in turn removes an additional condition from 2. You'll end up with:

public int ConvertStringToInt(string s)
{
    int val = 0;
    if (s[0] == '-')
    {
        for (int i = 1; i < s.Length; ++i )
        {
            val = val * 10 + s[i] - '0';
        }
        return -val;
    }
    else
    {
        for (int i = 0; i < s.Length; ++i)
        {
            val = val * 10 + s[i] - '0';
        }
        return val;
    }
}

Next up, you want equality instead of conversion. So, how lazy can we evaluate this? Well, we need to parse pretty much all of the string before we can do the check. The only thing we know is that if we encounter a '-' char, we also need a negative integer. I ended up with this:

public bool EqualsStringInt(string s, int value)
{
    int val = 0;
    if (s[0] == '-')
    {
        if (value >= 0) { return false; } // otherwise we expected another char

        for (int i = 1; i < s.Length; ++i )
        {
            val = val * 10 + s[i] - '0'; // s[i] must be a char between '0'-'9' as implied by the question.
        }
        return (-val) == value;
    }
    else
    {
        if (value < 0) { return false; } // otherwise we expected another char

        for (int i = 0; i < s.Length; ++i)
        {
            val = val * 10 + s[i] - '0';
        }
        return val == value;
    }
}

Integer to string equality

I've written a bit of code in the past for C++ that converts integers to strings here: C++ performance challenge: integer to std::string conversion . There are some good solutions here as well that might be worth considering if you're really looking for performance.

Just checking equality is easier than that though. If you look closely at the algorithm, you'll notice:

  • Buffer overallocation. You don't need that. your tests WILL go wrong here if you don't wait for the GC and/or use static strings to seed the process!
  • Buffer reallocation. If you've filled the buffer sequentially, you need to invert it as well. If you don't want for GC, this will influence the test outcome!

Both of these should be time consuming in the long run, and both of them will influence your tests.

At this point, it's interesting to note that you don't really need the complete string though - you just need a single character. So, let's work with that:

  • Equality fails if the sign doesn't match
  • Equality fails if the first character doesn't match
  • Equality succeeds if all characters that are generated are the same.

Or, in code:

public bool EqualsIntString(int value, string s)
{
    if (s.Length == 0) { return false; } // This is never good.
    if ((s[0] == '-' && value >= 0) || (s[0] != '-' && value < 0)) { return false; } // positive/negative check

    // Define the limit. This is basically the end of the string to check.
    int limit = 0;
    if (value < 0) // 1
    {
        limit = 1;
        value = -value;
    }

    for (int i=s.Length-1; i>=limit; --i)
    {
        char expected = (char)('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
        value /= 10; // same story.

        if (s[i] != expected) { return false; }
    }
    return true;
}

Again, if you don't have negative numbers, do the obvious optimization. by removing 1.

Can you do even faster? Well yes... that's why I posted the C++ link in the first place. Most of these algorithms can easily be adjusted for this 'equality' case.

Optional optimizations for the last solution

You can use a 10log to determine the length of the string. This implies a lower and an upper bound value to the integer. A simple lookup table can do this for you. However, 10log is quite slow if not properly implemented, so be sure to test this!

Which one is faster

Construct a proper test, and test it. I tried to test it here, but don't have the characteristics of your data, which I expect to make a difference.

Of course, if you don't need such blunt performance, use the standard implementations and equals, and test it.

Community
  • 1
  • 1
atlaste
  • 30,418
  • 3
  • 57
  • 87
  • your method is slower when the two strings are equal but faster when they aren't, as expected, – assylias May 28 '15 at 15:07
  • @assylias thanks for doing the test. Care to share the results of both functions? I did a quick test in c# on my laptop, and noticed 1s for IntToString / 1.2s for StringToInt (100M comparisons) - so I'm interested in the results for Java. – atlaste May 28 '15 at 15:10
  • I've updated my answer. Your method processes 100M equal strings/ints of 5 digits in 1.5s, but in only 0.4s if they are unequal. That's on my desktop PC which is fairly powerful. – assylias May 28 '15 at 15:12
  • 1
    @assylias Funny how branch prediction bites me in the bottoms. Thanks for sharing the results, it now all depends on his data I suppose :-) – atlaste May 28 '15 at 15:16
  • @atlaste, What if value=9999, s="9" ; value = 0 s="+0"; I've updated my question like your EqualsIntString() function – lab bhattacharjee May 29 '15 at 08:26
  • @labbhattacharjee As I said, there are plenty of boundary cases, such as `s="apehaar"`, thousands-delimiters, etc, etc. Normally you avoid these by fixing the locale (or more precise: the algorithm) when generating the strings - otherwise you're in for a lot of pain regardless of the solution you choose. For example: note that a ',' and a '.' means something entirely different in Dutch and English locales (thousand separator / fraction) -- there's no magic trick to fix that, not even `Integer.parseInt` will get you the results you want. – atlaste May 29 '15 at 11:10
0

I'd bet that

Integer.parseInt(str) == integer

is faster. All it takes is one multiplication and one addition per digit, while Integer.toString needs division and modulus per digit - the slowest int arithmetic operations.

Actually, there are some optimizations reducing the number of operations in Integer.toString, but the speed ratio is just too big.

Moreover, there's memory allocation for the new string.

OTOH, Integer.parseInt accepts all Unicode digits, not only 0 to 9, which slows it down a lot. I guess, Guava's Ints.parseInt is way faster.


In case the string can't be parsed, then you'd get a NumberFormatException. You can catch it, but it costs a lot of time and then Integer.toString is the winner.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
-1

I would go for the Interger.Parse(), which is likely to be more resilient in the face of different localisation cultures etc than the other approach...

Speed isn't really the issue - valid results are.

Martin Milan
  • 6,346
  • 2
  • 32
  • 44
  • I love anonymous downvotes... FIne - calculate the wrong answer slowly - see if I care lol – Martin Milan May 28 '15 at 14:02
  • I think it's because of the "Speed isn't really the issue" -- I'm not sure if that's a valid assumption; I think it's valid to assume (some) people don't ask about speed without a good reason. Still, I feel you make a good point with the `Integer.Parse` so I think that's a bit unfair: Equality with culture differences can ruin your results, so it's wise to work this way if that can be an issue. – atlaste May 28 '15 at 14:24
  • This site could be improved immensely if people who placed downvotes were required to post a statement as to why. They could remain anonymous if necessary, but in so many cases (not just this one) there might be value in what the cretin (sorry - valued community member...) is thinking. We should capture that. – Martin Milan May 28 '15 at 14:41
  • @MartinMilan This has been proposed several times on meta and rejected. One of the reasons was that it nearly always leads to lengthy discussions with not much gain. +++ I didn't downvote you, but note that the OP asks about the speed and you said nothing about it. – maaartinus May 28 '15 at 19:34
  • No - I thought he might be interested in getting the correct answer, rather than the wrong answer quickly lol... Take your point though... – Martin Milan May 29 '15 at 07:53
  • @MartinMilan '[...] in getting the correct answer [...]' -- Sorry, I don't believe you do get the point, so allow me to be blunt. I often ask questions about performance, and there are always people that tell me about 'premature optimizations' and that I shouldn't do that... Personally, I find that pretty annoying: we're wasting each other's time with that... after all, if I wanted to know the 'right way to do it', I would have asked about that instead. As such, I really don't believe your answer is the 'correct answer', because it doesn't answer the question, which was "which is *faster*". – atlaste May 30 '15 at 11:27
  • If you're purely interested in speed and nothing else then as you say, my answer is not all that helpful... Allow me to be equally blunt - there are no points for he who raises exceptions quickest... Anyway, this could go on forever, so rest certain in the knowledge that I have a warm fuzzy place in my heart for you, and code however the hell you like. – Martin Milan Jun 01 '15 at 09:33