Interesting question. For clarity, there's a lot you can do wrong with testing this, because Java does stuff with strings that might influence the results. So let's start with building a proper test.
Constructing your test
Specifically: a proper test doesn't rely on the loadstring, because that influences memory allocation. You want to make a test using dynamically constructed strings.
The 10-log of your integer (e.g. the length of the string) will influence the test outcome. The longer the string, the longer Integer.tryParse
will take. If you have a longer string, it will need to calculate more div/mul and take longer. An additional thing that influences performance is the '-' sign. If you have unsigned integers, this should be taken into account.
Basically measuring means:
- Create strings with the proper length (depends on your data!!!). More strings = better
- Create fail/pass an integer array that matches (or doesn't) with the string array.
- Garbage collect.
- Test with those two arrays.
Be sure to make a huge array for this during the test, so that your tests won't be influenced. Also make sure that the integers / random numbers that you use have the same characteristics as your data... Because of this, I cannot execute the tests for you, so I'll just stick to the theory.
String to integer equality
It helps to know how string to integer conversion works, so let's start with a blunt solution and work our way up. I currently don't have Java on my laptop, so I'm sorry for the C# syntax :-) You should be easily able to fix it though...
public int ConvertStringToInt(string s)
{
int val = 0;
if (s[0] == '-') // 1
{
for (int i = 1; i < s.Length; ++i )
{
if (s[i] >= '0' && s[i] <= '9') // 2
{
throw new Exception();
}
val = val * 10 + s[i] - '0';
}
return -val;
}
else
{
for (int i = 0; i < s.Length; ++i)
{
if (s[i] >= '0' && s[i] <= '9')
{
throw new Exception();
}
val = val * 10 + s[i] - '0';
}
return val;
}
}
If you know for a fact that the numbers in the string are never negative, you can of course drop the condition 1. Also, if you know for a fact that the string is always a number (which is IMO implied in your question), you can optimize 2. A little trick that I usually use is to use arithmetic overflows to generate large unsigned numbers, which in turn removes an additional condition from 2. You'll end up with:
public int ConvertStringToInt(string s)
{
int val = 0;
if (s[0] == '-')
{
for (int i = 1; i < s.Length; ++i )
{
val = val * 10 + s[i] - '0';
}
return -val;
}
else
{
for (int i = 0; i < s.Length; ++i)
{
val = val * 10 + s[i] - '0';
}
return val;
}
}
Next up, you want equality instead of conversion. So, how lazy can we evaluate this? Well, we need to parse pretty much all of the string before we can do the check. The only thing we know is that if we encounter a '-' char, we also need a negative integer. I ended up with this:
public bool EqualsStringInt(string s, int value)
{
int val = 0;
if (s[0] == '-')
{
if (value >= 0) { return false; } // otherwise we expected another char
for (int i = 1; i < s.Length; ++i )
{
val = val * 10 + s[i] - '0'; // s[i] must be a char between '0'-'9' as implied by the question.
}
return (-val) == value;
}
else
{
if (value < 0) { return false; } // otherwise we expected another char
for (int i = 0; i < s.Length; ++i)
{
val = val * 10 + s[i] - '0';
}
return val == value;
}
}
Integer to string equality
I've written a bit of code in the past for C++ that converts integers to strings here: C++ performance challenge: integer to std::string conversion . There are some good solutions here as well that might be worth considering if you're really looking for performance.
Just checking equality is easier than that though. If you look closely at the algorithm, you'll notice:
- Buffer overallocation. You don't need that. your tests WILL go wrong here if you don't wait for the GC and/or use static strings to seed the process!
- Buffer reallocation. If you've filled the buffer sequentially, you need to invert it as well. If you don't want for GC, this will influence the test outcome!
Both of these should be time consuming in the long run, and both of them will influence your tests.
At this point, it's interesting to note that you don't really need the complete string though - you just need a single character. So, let's work with that:
- Equality fails if the sign doesn't match
- Equality fails if the first character doesn't match
- Equality succeeds if all characters that are generated are the same.
Or, in code:
public bool EqualsIntString(int value, string s)
{
if (s.Length == 0) { return false; } // This is never good.
if ((s[0] == '-' && value >= 0) || (s[0] != '-' && value < 0)) { return false; } // positive/negative check
// Define the limit. This is basically the end of the string to check.
int limit = 0;
if (value < 0) // 1
{
limit = 1;
value = -value;
}
for (int i=s.Length-1; i>=limit; --i)
{
char expected = (char)('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
value /= 10; // same story.
if (s[i] != expected) { return false; }
}
return true;
}
Again, if you don't have negative numbers, do the obvious optimization. by removing 1.
Can you do even faster? Well yes... that's why I posted the C++ link in the first place. Most of these algorithms can easily be adjusted for this 'equality' case.
Optional optimizations for the last solution
You can use a 10log to determine the length of the string. This implies a lower and an upper bound value to the integer. A simple lookup table can do this for you. However, 10log is quite slow if not properly implemented, so be sure to test this!
Which one is faster
Construct a proper test, and test it. I tried to test it here, but don't have the characteristics of your data, which I expect to make a difference.
Of course, if you don't need such blunt performance, use the standard implementations and equals, and test it.