1

I have a comparator that sorts an array of strings that contain letters and numbers, but can't seem to identify the regular expression that sorts them in the manner I am looking for.

I have used this question as a reference for my comparator.

array={string-a01,string-a20,string-a100,string-b01,string-b20,string-b100,string-c01,string-c20,string-c100 etc.}

Collections.sort(array, new Comparator<String>(){       
    public int compare(String o1, String o2) {
        return extractInt(o1) - extractInt(o2);
    }

    int extractInt(String s) {
        String num = s.replaceAll("\\D", "");
        return num.isEmpty() ? 0 : Integer.parseInt(num);
    }
});
        
for (String element : array) {
    System.out.println(element);
}

Before Introducing the comparator the output was:
string-a01, string-a100, string-a20, string-b01, string-b100, string-b20, string-c01, string-c20, string-c100

The output that this code produces is:
string-a01, string-b01, string-c01 string-a20, string-b20, string-c20 string-a100, string-b100, string-c100

The output I would like it to produce is:
string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100


EDIT: Edited for clarification. Array has been changed and output before the comparator was added.

Community
  • 1
  • 1
Jon
  • 31
  • 4

3 Answers3

3

Assuming that the string part is actually something else than just "string". You can extract the letter part of the ending, and the digit part, and compare those using a composite Comparator:

String[] array = { "string-a20", "string-a01", "string-b01",
    "string-b20", "string-c01", "string-c20",
    "string-a100", "string-b100", "string-c100" };

Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

List<String> result = Arrays.stream(array)
    .map(p::matcher)
    .filter(Matcher::find)
    .sorted(Comparator.comparing((Matcher m) -> m.group(1)) // Compare the letter part
        .thenComparingInt(m -> Integer.parseInt(m.group(2)))) // Compare the number part
    .map(m -> m.group(0)) // Map back to String
    .collect(Collectors.toList());

System.out.println(result);

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]

Legacy version (With the downside of having to recreate Matchers):

Arrays.sort(array, new Comparator<String>() {

    Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

    @Override
    public int compare(String o1, String o2) {
        Matcher m1 = p.matcher(o1);
        Matcher m2 = p.matcher(o2);

        if(!(m1.find() && m2.find()))
            return 0; // Or throw a format exception

        int comparison = m1.group(1).compareTo(m2.group(1));
        return comparison != 0
            ? comparison 
            : Integer.compare(Integer.parseInt(m1.group(2)), Integer.parseInt(m2.group(2)));
    }

});
GingerHead
  • 8,130
  • 15
  • 59
  • 93
Jorn Vernee
  • 31,735
  • 4
  • 76
  • 93
  • 1
    I have made another update because the inital question didnt ask for everything i was looking for. The issue is that it orders like this: b01, b100, b11, c01, c100, c11... the array is generated dynamically and didnt realize the problem wouldnt occur with the original array posted – Jon Aug 18 '16 at 16:17
  • 1
    @Jon, Yeah I was looking into that, but it's a little more complicated. – Jorn Vernee Aug 18 '16 at 16:18
  • 1
    @Jon [PLUS ONE] This must be the answer because of using lambdas perfectly to elaborate the answer in the most easiest way –  Aug 18 '16 at 16:44
  • @JornVernee Unfortunately, the version of java I'm running doesn't support lambdas and on this project upgrading isn't an option – Jon Aug 18 '16 at 17:44
  • 1
    @Jon That's unfortunate, I added a legacy version too. – Jorn Vernee Aug 18 '16 at 17:54
  • 1
    @JornVernee Works like a charm! For anyone with the same issue dont forget the `private final Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");` Thanks!! – Jon Aug 18 '16 at 18:05
  • @jon, OK it's added now – GingerHead Aug 19 '16 at 10:21
1

You are removing the alphabetical characters in your extractInt method, so you won't be able to use them in the comparison.

You should just sort them with no Comparator, which will sort them using the default, lexicographical sorting algorithm (java.lang.String implements Comparable<String>).

Example

// test array
String[] s = {"string-a01","string-a01","string-b01","string-b02","string-c02","string-c02"};

// sorting with null Comparator, will sort if the type implements Comparable - 
// which String does
Arrays.sort(s);

// printing in human-readable form
System.out.println(
    Arrays.toString(s)
);

Output

[string-a01, string-a01, string-b01, string-b02, string-c02, string-c02]

Notes

  • If you want to remove duplicates (which might be your intent from the question - not clear), add the array elements to a TreeSet instead:

    Set<String> deduplicated = new TreeSet<>(Arrays.asList(s));
    
  • If your sorting algorithm must act so that 2 comes before 12, then you need to extract the integer value without removing it from the elements, and compare it only when the rest of the Strings are equal.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
Mena
  • 47,782
  • 11
  • 87
  • 106
  • Why don't you use lambda, easier – GingerHead Aug 18 '16 at 15:54
  • @GingerHead how is using the Java 8 stream API any "easier" than `Arrays.sort`, given this context? – Mena Aug 18 '16 at 15:55
  • For example `Arrays.sort(s, (a, b) -> a.length() - b.length());` – GingerHead Aug 18 '16 at 15:58
  • @GingerHead that will only achieve the desired result because the `String`s in the example are of the same length. Length of the `String` is nowhere a sorting criterium in OP's question, implied or not. – Mena Aug 18 '16 at 16:00
  • I just gave you a hint so that you would continue hereafter – GingerHead Aug 18 '16 at 16:01
  • @GingerHead `Arrays.sort(s)` is clearly simpler than `Arrays.sort(s, something)`. – Andy Turner Aug 18 '16 at 16:02
  • @GingerHead thanks, I know how to use lambdas (at least for simple cases like this one). I'm just hinting that there's no good reason to use lambdas here, since OP's case doesn't seem to require anything but default sorting logic for `String`s. – Mena Aug 18 '16 at 16:03
  • @AndyTurner oops - too much distraction / not enough coffee. – Mena Aug 18 '16 at 16:04
  • @AndyTurner Regarding the complexity of its strings in the array better to use a rule than nothing to be on the safe side – GingerHead Aug 18 '16 at 16:07
  • 1
    @GingerHead nope. It's sufficient to rely upon well-documented behaviour. – Andy Turner Aug 18 '16 at 16:07
  • @AndyTurner What if the elements changed in the future, usability is one of the major pillars of robust code architecture – GingerHead Aug 18 '16 at 16:09
  • @GingerHead I don't know what you mean. There is nothing unusable about `Arrays.sort`. There is nothing more robust about using a lambda function. – Andy Turner Aug 18 '16 at 16:11
  • @AndyTurner I am addressing the changeability of the requirements – GingerHead Aug 18 '16 at 16:12
  • 1
    @Mena I have made another update because the inital question didnt ask for everything i was looking for. The issue is that it orders like this: b01, b100, b11, c01, c100, c11... the array is generated dynamically and didnt realize the problem wouldnt occur with the original array posted – Jon Aug 18 '16 at 16:18
  • @Jon see note 2 in my answer, or Andy's answer. – Mena Aug 18 '16 at 16:37
1

It sounds like you want to order the strings on the "leading strings", i.e. everything up to the digits; if the leading strings are equal, then compare on the subsequent digits.

To split the string into its "string" and "integer" parts, you can first the "first trailing digit", i.e. the position of the first character in the string where there are no non-digits between it and the end of the string:

int firstTrailingDigit(String s) {
  int i = s.length();
  while (i > 0 && Character.isDigit(s.charAt(i - 1))) {
    --i;
  }
  return i;
}

You can then use this in your comparator:

public int compare(String a, String b) {
  int ftdA = firstTrailingDigit(a);
  int ftdB = firstTrailingDigit(b);

  // Get the leading strings, and compare.
  String sA = a.substring(0, ftdA);
  String sB = b.substring(0, ftdB);
  int compareStrings = sA.compareTo(sB);
  if (compareStrings != 0) {
    // If they're not equal, return the result of the comparison.
    return compareStrings;
  }

  // Get the trailing numbers from the strings, and compare.
  int iA = Integer.parseInt(a.substring(ftdA));
  int iB = Integer.parseInt(b.substring(ftdB));
  return Integer.compare(iA, iB);
}

Ideone demo

Input:

String[] array = {"string-a01","string-a20","string-a100","string-b01","string-b20","string-b100","string-c01","string-c20","string-c100"};

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]
Andy Turner
  • 137,514
  • 11
  • 162
  • 243