Adding on to the answer made by @stanislav.
A few problems I faced while using the answer provided was:
- Capital and small letters are separated by the characters between their ASCII codes. This breaks the flow when the strings being sorted have _ or other characters which are between small letters and capital letters in ASCII.
- If two strings are the same except for the leading zeroes count being different, the function returns 0 which will make the sort depend on the original positions of the string in the list.
These two issues have been fixed in the new code. And I made a few function instead of a few repetitive set of code. The differentCaseCompared variable keeps track of whether if two strings are the same except for the cases being different. If so the value of the first different case characters subtracted is returned. This is done to avoid the issue of having two strings differing by case returned as 0.
public class NaturalSortingComparator implements Comparator<String> {
@Override
public int compare(String string1, String string2) {
int lengthOfString1 = string1.length();
int lengthOfString2 = string2.length();
int iteratorOfString1 = 0;
int iteratorOfString2 = 0;
int differentCaseCompared = 0;
while (true) {
if (iteratorOfString1 == lengthOfString1) {
if (iteratorOfString2 == lengthOfString2) {
if (lengthOfString1 == lengthOfString2) {
// If both strings are the same except for the different cases, the differentCaseCompared will be returned
return differentCaseCompared;
}
//If the characters are the same at the point, returns the difference between length of the strings
else {
return lengthOfString1 - lengthOfString2;
}
}
//If String2 is bigger than String1
else
return -1;
}
//Check if String1 is bigger than string2
if (iteratorOfString2 == lengthOfString2) {
return 1;
}
char ch1 = string1.charAt(iteratorOfString1);
char ch2 = string2.charAt(iteratorOfString2);
if (Character.isDigit(ch1) && Character.isDigit(ch2)) {
// skip leading zeros
iteratorOfString1 = skipLeadingZeroes(string1, lengthOfString1, iteratorOfString1);
iteratorOfString2 = skipLeadingZeroes(string2, lengthOfString2, iteratorOfString2);
// find the ends of the numbers
int endPositionOfNumbersInString1 = findEndPositionOfNumber(string1, lengthOfString1, iteratorOfString1);
int endPositionOfNumbersInString2 = findEndPositionOfNumber(string2, lengthOfString2, iteratorOfString2);
int lengthOfDigitsInString1 = endPositionOfNumbersInString1 - iteratorOfString1;
int lengthOfDigitsInString2 = endPositionOfNumbersInString2 - iteratorOfString2;
// if the lengths are different, then the longer number is bigger
if (lengthOfDigitsInString1 != lengthOfDigitsInString2)
return lengthOfDigitsInString1 - lengthOfDigitsInString2;
// compare numbers digit by digit
while (iteratorOfString1 < endPositionOfNumbersInString1) {
if (string1.charAt(iteratorOfString1) != string2.charAt(iteratorOfString2))
return string1.charAt(iteratorOfString1) - string2.charAt(iteratorOfString2);
iteratorOfString1++;
iteratorOfString2++;
}
} else {
// plain characters comparison
if (ch1 != ch2) {
if (!ignoreCharacterCaseEquals(ch1, ch2))
return Character.toLowerCase(ch1) - Character.toLowerCase(ch2);
// Set a differentCaseCompared if the characters being compared are different case.
// Should be done only once, hence the check with 0
if (differentCaseCompared == 0) {
differentCaseCompared = ch1 - ch2;
}
}
iteratorOfString1++;
iteratorOfString2++;
}
}
}
private boolean ignoreCharacterCaseEquals(char character1, char character2) {
return Character.toLowerCase(character1) == Character.toLowerCase(character2);
}
private int findEndPositionOfNumber(String string, int lengthOfString, int end) {
while (end < lengthOfString && Character.isDigit(string.charAt(end)))
end++;
return end;
}
private int skipLeadingZeroes(String string, int lengthOfString, int iteratorOfString) {
while (iteratorOfString < lengthOfString && string.charAt(iteratorOfString) == '0')
iteratorOfString++;
return iteratorOfString;
}
}
The following is a unit test I used.
public class NaturalSortingComparatorTest {
private int NUMBER_OF_TEST_CASES = 100000;
@Test
public void compare() {
NaturalSortingComparator naturalSortingComparator = new NaturalSortingComparator();
List<String> expectedStringList = getCorrectStringList();
List<String> testListOfStrings = createTestListOfStrings();
runTestCases(expectedStringList, testListOfStrings, NUMBER_OF_TEST_CASES, naturalSortingComparator);
}
private void runTestCases(List<String> expectedStringList, List<String> testListOfStrings,
int numberOfTestCases, Comparator<String> comparator) {
for (int testCase = 0; testCase < numberOfTestCases; testCase++) {
Collections.shuffle(testListOfStrings);
testListOfStrings.sort(comparator);
Assert.assertEquals(expectedStringList, testListOfStrings);
}
}
private List<String> getCorrectStringList() {
return Arrays.asList(
"1", "01", "001", "2", "02", "10", "10", "010",
"20", "100", "_1", "_01", "_2", "_200", "A 02",
"A01", "a2", "A20", "t1A", "t1a", "t1AB", "t1Ab",
"t1aB", "t1ab", "T010T01", "T0010T01");
}
private List<String> createTestListOfStrings() {
return Arrays.asList(
"10", "20", "A20", "2", "t1ab", "01", "T010T01", "t1aB",
"_2", "001", "_200", "1", "A 02", "t1Ab", "a2", "_1", "t1A", "_01",
"100", "02", "T0010T01", "t1AB", "10", "A01", "010", "t1a");
}
}
Suggestions welcome! I am not sure whether adding the functions changes anything other than the readability part of things.
P.S: Sorry to add another answer to this question. But I don't have enough reps to comment on the answer which I modified for my use.