very stupid question which I don't get. When comparing numeric strings in java, why "01" is less than "1"?
Ashley
very stupid question which I don't get. When comparing numeric strings in java, why "01" is less than "1"?
Ashley
Because "0"
is less than "1"
. Lexicographic comparisons are character per character, stopping with the first character that differs.
This is described in detail in String.compareTo
:
This is the definition of lexicographic ordering. If two strings are different, then either they have different characters at some index that is a valid index for both strings, or their lengths are different, or both. If they have different characters at one or more index positions, let k be the smallest such index; then the string whose character at position k has the smaller value, as determined by using the
<
operator, lexicographically precedes the other string. In this case,compareTo
returns the difference of the two character values at position k in the two string -- that is, the value:this.charAt(k)-anotherString.charAt(k)
If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case,
compareTo
returns the difference of the lengths of the strings -- that is, the value:this.length()-anotherString.length()
Because String comparison (Comparable
implementation) relies on a lexicographical comparison of each character unicode value of the compared String
s.
And "1"
(U+0031
unicode) is after "0"
(U+0030
unicode) according to
the unicode table.
It works in the same way than in a dictionary and not as in a numeric comparison.
You compare some String
s, not some Number
s.
"azerty"
comes before "zip"
because "a"
is before "z"
It is the same thing for "1"
and "01"
.
"0"
comes before "1"
. So "01"
< "1"
.
You can refer to String.compareTo()
javadoc to understand better the contract of the method.
Suppose you want to compare the numeric value of the two strings, and they are integers:
Integer.parseInt("0") < Integer.parseInt("01")
It is up to the Locale to determine how strings are compared. For westeren languages you have a simple ordering of characters and you start from the left. Here the leftmost character is "0" and "1" respectively, and "0" has a lower unicode value than "1" which immediately decides that the string starting with "0" comes before the one starting with "1".
That strings accidentially resemble something else (like integer numbers character by character) is not concerning this mechanism. You need to write the code to take this in consideration if it is relevant to the task you need to solve.