80

I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.

I can roll my own (it's not a very hard problem), but I'd rather not re-invent the wheel if I don't have to.

In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".


1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"

IAdapter
  • 62,595
  • 73
  • 179
  • 242
Kip
  • 107,154
  • 87
  • 232
  • 265
  • Funny, everyone - re-read the question and delete the posted answers!! .. :) I guess this is the power of DOWNVOTE!!! :) ;) – OscarRyz Aug 11 '09 at 19:01
  • 2
    BTW. Numeric is not the natural order when talking about strings, so the question is a bit misleading. – OscarRyz Aug 11 '09 at 19:02
  • 2
    Nothing like that built-in to my knowledge. When coding it takes less time than asking on SO, I generally run for my own wheel... :) – glmxndr Aug 11 '09 at 19:10
  • 4
    @Oscar usually natural sort order means that "image10.jpg" is sorted greater than "image9.jpg". in other words, numeric parts of the string are treated as integers and compared as such. my example is no different, except that it is "closer" to a purely numeric value. but the same algorithm would handle both equally well. – Kip Aug 11 '09 at 19:17
  • 9
    Nothing is natural wrt. to version comparison. Is 1.2.10p1 before or after 1.2.10? What about 1.2.10b1, and 1.20.10pre1? – Martin v. Löwis Aug 11 '09 at 19:18
  • @subtenante- i agree, but i'm a few hours of coding away from needing to do the sorting, so i figured i'd let SO work for me while i was working on other stuff. :) – Kip Aug 11 '09 at 19:19
  • @martin `1.2.10 < 1.2.10b1 < 1.2.10p1 < 1.20.10pre1`. i guess it could vary for some development shops, but for me those strings could just as easily have been `album1page2image10.jpg < album1page2image10b1.jpg < album1page2image10p1.jpg < album1page20image10pre1`, the algorithm would be the same both ways. – Kip Aug 11 '09 at 19:23
  • @Kip, I agree, I meant java's "natural" meaning differs. I posted an answer with the findings, let us know if it helps – OscarRyz Aug 11 '09 at 19:28
  • @Oscar @martin I've updated the question to be more explicit as to what I meant – Kip Aug 11 '09 at 19:40

8 Answers8

57

In java the "natural" order meaning is "lexicographical" order, so there is no implementation in the core like the one you're looking for.

There are open source implementations.

Here's one:

NaturalOrderComparator.java

Make sure you read the:

Cougaar Open Source License

I hope this helps!

Pierre-Luc Paour
  • 1,725
  • 17
  • 21
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • thanks. i revised the question to clarify what i meant by "natural" – Kip Aug 11 '09 at 19:38
  • 2
    I love SO - just randomly poking around, and here's a quick solution to something I've been putting off for the past week. That's a very useful link - thanks! (And the license is compatible with our project) – Kevin Day Aug 11 '09 at 20:55
  • 5
    When choosing an open source implementation like these you should make sure they do exactly what you expect. Many are focused on providing an order that looks intuitive and pretty in a user interface. Sometimes they accept and skip whitespace, skip leading zeros and most importantly places shorter strings before longer strings when they are equivalent. The string 1.020 will then be placed after 1.20. If you're using it for determining if two versions are equal you can get a false negative in this case. I.e. when checking that compareTo() returns 0. – sigget Sep 27 '11 at 22:50
  • I ported Martin Pool's original code to Java a long time ago. I copied Martin's license in my implementation, but his web site explicitly allows re-implementors to relicense it as they see fit. Contact me if the original license is a problem for you. – Pierre-Luc Paour Jun 24 '14 at 14:11
  • 1
    In case anyone comes to this and discovers the original Coogaar website is offline: a copy of the license is available in the Internet Archive at https://web.archive.org/web/20160814021343/http://cougaar.org:80/wp/license/ – mhucka Oct 23 '17 at 01:02
  • **WARNING**: this implementation will sort the following strings in **this order**: `["4.0", "3.14", "3.0.1"]`. Why? Because the `.` (and the `,`) are treated as "digits" (with a value less than 0, but this is unimportant for this example). – Walter Tross Mar 05 '21 at 08:04
  • to be 100% clear: according to this implementation `"4.0" < "3.14"` and `"3.14" < "3.0.1"` – Walter Tross Mar 05 '21 at 08:31
12

I have tested three Java implementations mentioned here by others and found that their work slightly differently but none as I would expect.

Both AlphaNumericStringComparator and AlphanumComparator do not ignore whitespaces so that pic2 is placed before pic 1.

On the other hand NaturalOrderComparator ignores not only whitespaces but also all leading zeros so that sig[1] precedes sig[0].

Regarding performance AlphaNumericStringComparator is ~x10 slower then the other two.

Mikhail Poda
  • 5,742
  • 3
  • 39
  • 52
  • it has to be like this since AlphaNumericStringComparator is using regex (which is a bad idea anyway) – Rafael T May 08 '13 at 13:39
10

String implements Comparable, and that is what natural ordering is in Java (comparing using the comparable interface). You can put the strings in a TreeSet or sort using the Collections or Arrays classes.

However, in your case you don't want "natural ordering" you really want a custom comparator, which you can then use in the Collections.sort method or the Arrays.sort method that takes a comparator.

In terms of the specific logic you are looking for implementing within the comparator, (numbers separated by dots) I'm not aware of any existing standard implementations of that, but as you said, it is not a hard problem.

EDIT: In your comment, your link gets you here, which does a decent job if you don't mind the fact that it is case sensitive. Here is that code modified to allow you to pass in the String.CASE_INSENSITIVE_ORDER:

    /*
     * The Alphanum Algorithm is an improved sorting algorithm for strings
     * containing numbers.  Instead of sorting numbers in ASCII order like
     * a standard sort, this algorithm sorts numbers in numeric order.
     *
     * The Alphanum Algorithm is discussed at http://www.DaveKoelle.com
     *
     *
     * This library is free software; you can redistribute it and/or
     * modify it under the terms of the GNU Lesser General Public
     * License as published by the Free Software Foundation; either
     * version 2.1 of the License, or any later version.
     *
     * This library is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
     * Lesser General Public License for more details.
     *
     * You should have received a copy of the GNU Lesser General Public
     * License along with this library; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
     *
     */

    import java.util.Comparator;

    /**
     * This is an updated version with enhancements made by Daniel Migowski,
     * Andre Bogus, and David Koelle
     *
     * To convert to use Templates (Java 1.5+):
     *   - Change "implements Comparator" to "implements Comparator<String>"
     *   - Change "compare(Object o1, Object o2)" to "compare(String s1, String s2)"
     *   - Remove the type checking and casting in compare().
     *
     * To use this class:
     *   Use the static "sort" method from the java.util.Collections class:
     *   Collections.sort(your list, new AlphanumComparator());
     */
    public class AlphanumComparator implements Comparator<String>
    {
        private Comparator<String> comparator = new NaturalComparator();

        public AlphanumComparator(Comparator<String> comparator) {
            this.comparator = comparator;
        }

        public AlphanumComparator() {

        }

        private final boolean isDigit(char ch)
        {
            return ch >= 48 && ch <= 57;
        }

        /** Length of string is passed in for improved efficiency (only need to calculate it once) **/
        private final String getChunk(String s, int slength, int marker)
        {
            StringBuilder chunk = new StringBuilder();
            char c = s.charAt(marker);
            chunk.append(c);
            marker++;
            if (isDigit(c))
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (!isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            } else
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            }
            return chunk.toString();
        }

        public int compare(String s1, String s2)
        {

            int thisMarker = 0;
            int thatMarker = 0;
            int s1Length = s1.length();
            int s2Length = s2.length();

            while (thisMarker < s1Length && thatMarker < s2Length)
            {
                String thisChunk = getChunk(s1, s1Length, thisMarker);
                thisMarker += thisChunk.length();

                String thatChunk = getChunk(s2, s2Length, thatMarker);
                thatMarker += thatChunk.length();

                // If both chunks contain numeric characters, sort them numerically
                int result = 0;
                if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0)))
                {
                    // Simple chunk comparison by length.
                    int thisChunkLength = thisChunk.length();
                    result = thisChunkLength - thatChunk.length();
                    // If equal, the first different number counts
                    if (result == 0)
                    {
                        for (int i = 0; i < thisChunkLength; i++)
                        {
                            result = thisChunk.charAt(i) - thatChunk.charAt(i);
                            if (result != 0)
                            {
                                return result;
                            }
                        }
                    }
                } else
                {
                    result = comparator.compare(thisChunk, thatChunk);
                }

                if (result != 0)
                    return result;
            }

            return s1Length - s2Length;
        }

        private static class NaturalComparator implements Comparator<String> {
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        }
    }
Yishai
  • 90,445
  • 31
  • 189
  • 263
  • 10
    "Natural sort" is the widely-used term for sorting "image9.jpg" as being less than "image10.jpg". it is "natural" because it is how a human being would sort them, as opposed to the unnatural "ascii-betical" sorting that computers do by default. http://www.codinghorror.com/blog/archives/001018.html – Kip Aug 11 '09 at 19:28
  • I've updated the question to be more clear in this regard – Kip Aug 11 '09 at 19:39
  • 1
    In the codinghorror link you post, it has a link which gets to this, which has a Java implementation: http://www.davekoelle.com/alphanum.html – Yishai Aug 11 '09 at 20:57
  • How would you implement that the comparator ignores spaces? So that pic2 is placed after pic 1... – Lumis Sep 21 '12 at 10:59
  • @Lumis, I suggest you make that another question (as this one is over two years old) - it will get better attention. However, it depends on how you want to handle the edge case where two strings are equivalent without spaces, but are not equivalent with spaces, and the amount of strings to be sorted. – Yishai Sep 21 '12 at 16:58
  • @Yishai thanks, I've posted the new question: http://stackoverflow.com/questions/12536296/sorting-a-string-array-by-natural-order-and-ignore-spaces – Lumis Sep 21 '12 at 18:38
9

Have a look at this implementation. It should be as fast as possible, without any regular expressions or array manipulation or method calls, just a couple of flags and a lot of cases.

This should sort any combination of numbers inside strings and properly support numbers which are equal and move on.

public static int naturalCompare(String a, String b, boolean ignoreCase) {
    if (ignoreCase) {
        a = a.toLowerCase();
        b = b.toLowerCase();
    }
    int aLength = a.length();
    int bLength = b.length();
    int minSize = Math.min(aLength, bLength);
    char aChar, bChar;
    boolean aNumber, bNumber;
    boolean asNumeric = false;
    int lastNumericCompare = 0;
    for (int i = 0; i < minSize; i++) {
        aChar = a.charAt(i);
        bChar = b.charAt(i);
        aNumber = aChar >= '0' && aChar <= '9';
        bNumber = bChar >= '0' && bChar <= '9';
        if (asNumeric)
            if (aNumber && bNumber) {
                if (lastNumericCompare == 0)
                    lastNumericCompare = aChar - bChar;
            } else if (aNumber)
                return 1;
            else if (bNumber)
                return -1;
            else if (lastNumericCompare == 0) {
                if (aChar != bChar)
                    return aChar - bChar;
                asNumeric = false;
            } else
                return lastNumericCompare;
        else if (aNumber && bNumber) {
            asNumeric = true;
            if (lastNumericCompare == 0)
                lastNumericCompare = aChar - bChar;
        } else if (aChar != bChar)
            return aChar - bChar;
    }
    if (asNumeric)
        if (aLength > bLength && a.charAt(bLength) >= '0' && a.charAt(bLength) <= '9') // as number
            return 1;  // a has bigger size, thus b is smaller
        else if (bLength > aLength && b.charAt(aLength) >= '0' && b.charAt(aLength) <= '9') // as number
            return -1;  // b has bigger size, thus a is smaller
        else if (lastNumericCompare == 0)
          return aLength - bLength;
        else
            return lastNumericCompare;
    else
        return aLength - bLength;
}
OpenSauce
  • 8,533
  • 1
  • 24
  • 29
Panayotis
  • 1,792
  • 23
  • 32
2

How about using the split() method from String, parse the single numeric string and then compare them one by one?

 @Test
public void test(){
    System.out.print(compare("1.12.4".split("\\."), "1.13.4".split("\\."),0));
}


public static int compare(String[] arr1, String[] arr2, int index){
    // if arrays do not have equal size then and comparison reached the upper bound of one of them
    // then the longer array is considered the bigger ( --> 2.2.0 is bigger then 2.2)
    if(arr1.length <= index || arr2.length <= index) return arr1.length - arr2.length;
    int result = Integer.parseInt(arr1[index]) - Integer.parseInt(arr2[index]);
    return result == 0 ?  compare(arr1, arr2, ++index) : result;
}

I did not check the corner cases but that should work and it's quite compact

bennidi
  • 2,092
  • 21
  • 27
  • this is more limited: it only handles strings that are dot-delimited lists of integers. I might want to compare "user1album12photo4.jpg" to "user1album13photo4.jpg" – Kip Jul 05 '12 at 15:08
  • true...i only focused on the software version strings. sorry, but still, i personally like the solution – bennidi Jul 27 '12 at 12:48
1

It concats the digits, then compares it. And if it's not applicable it continues.

public int compare(String o1, String o2) {
if(o1 == null||o2 == null)
    return 0;
for(int i = 0; i<o1.length()&&i<o2.length();i++){
    if(Character.isDigit(o1.charAt(i)) || Character.isDigit(o2.charAt(i)))
    {
    String dig1 = "",dig2 = "";     
    for(int x = i; x<o1.length() && Character.isDigit(o1.charAt(i)); x++){                              
        dig1+=o1.charAt(x);
    }
    for(int x = i; x<o2.length() && Character.isDigit(o2.charAt(i)); x++){
        dig2+=o2.charAt(x);
    }
    if(Integer.valueOf(dig1) < Integer.valueOf(dig2))
        return -1;
    if(Integer.valueOf(dig1) > Integer.valueOf(dig2))
        return 1;
    }       
if(o1.charAt(i)<o2.charAt(i))
    return -1;
if(o1.charAt(i)>o2.charAt(i))
    return 1;
}
return 0;

}

xtf
  • 31
  • 7
1

Using RuleBasedCollator might be an option as well. Though you'd have to add all the sort order rules in advance so it's not a good solution if you want to take larger numbers into account as well.

Adding specific customizations such as 2 < 10 is quite easy though and might be useful for sorting special version identifiers like Trusty < Precise < Xenial < Yakkety.

RuleBasedCollator localRules = (RuleBasedCollator) Collator.getInstance();

String extraRules = IntStream.range(0, 100).mapToObj(String::valueOf).collect(joining(" < "));
RuleBasedCollator c = new RuleBasedCollator(localRules.getRules() + " & " + extraRules);

List<String> a = asList("1-2", "1-02", "1-20", "10-20", "fred", "jane", "pic01", "pic02", "pic02a", "pic 5", "pic05", "pic   7", "pic100", "pic100a", "pic120", "pic121");
shuffle(a);

a.sort(c);
System.out.println(a);
rednoah
  • 1,053
  • 14
  • 41
0

Might be a late reply. But my answer can help someone else who needs a comparator like this.

I verified couple of other comparators too. But mine seems bit efficient than others I compared. Also tried the one that Yishai has posted. Mine is taking only half of the time as the mentioned one for data of alphanumeric data set of 100 entries.

/**
 * Sorter that compares the given Alpha-numeric strings. This iterates through each characters to
 * decide the sort order. There are 3 possible cases while iterating,
 * 
 * <li>If both have same non-digit characters then the consecutive characters will be considered for
 * comparison.</li>
 * 
 * <li>If both have numbers at the same position (with/without non-digit characters) the consecutive
 * digit characters will be considered to form the valid integer representation of the characters
 * will be taken and compared.</li>
 * 
 * <li>At any point if the comparison gives the order(either > or <) then the consecutive characters
 * will not be considered.</li>
 * 
 * For ex., this will be the ordered O/P of the given list of Strings.(The bold characters decides
 * its order) <i><b>2</b>b,<b>100</b>b,a<b>1</b>,A<b>2</b>y,a<b>100</b>,</i>
 * 
 * @author kannan_r
 * 
 */
class AlphaNumericSorter implements Comparator<String>
{
    /**
     * Does the Alphanumeric sort of the given two string
     */
    public int compare(String theStr1, String theStr2)
    {
        char[] theCharArr1 = theStr1.toCharArray();
        char[] theCharArr2 = theStr2.toCharArray();
        int aPosition = 0;
        if (Character.isDigit(theCharArr1[aPosition]) && Character.isDigit(theCharArr2[aPosition]))
        {
            return sortAsNumber(theCharArr1, theCharArr2, aPosition++ );
        }
        return sortAsString(theCharArr1, theCharArr2, 0);
    }

    /**
     * Sort the given Arrays as string starting from the given position. This will be a simple case
     * insensitive sort of each characters. But at any given position if there are digits in both
     * arrays then the method sortAsNumber will be invoked for the given position.
     * 
     * @param theArray1 The first character array.
     * @param theArray2 The second character array.
     * @param thePosition The position starting from which the calculation will be done.
     * @return positive number when the Array1 is greater than Array2<br/>
     *         negative number when the Array2 is greater than Array1<br/>
     *         zero when the Array1 is equal to Array2
     */
    private int sortAsString(char[] theArray1, char[] theArray2, int thePosition)
    {
        int aResult = 0;
        if (thePosition < theArray1.length && thePosition < theArray2.length)
        {
            aResult = (int)theArray1[thePosition] - (int)theArray2[thePosition];
            if (aResult == 0)
            {
                ++thePosition;
                if (thePosition < theArray1.length && thePosition < theArray2.length)
                {
                    if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray2[thePosition]))
                    {
                        aResult = sortAsNumber(theArray1, theArray2, thePosition);
                    }
                    else
                    {
                        aResult = sortAsString(theArray1, theArray2, thePosition);
                    }
                }
            }
        }
        else
        {
            aResult = theArray1.length - theArray2.length;
        }
        return aResult;
    }

    /**
     * Sorts the characters in the given array as number starting from the given position. When
     * sorted as numbers the consecutive characters starting from the given position upto the first
     * non-digit character will be considered.
     * 
     * @param theArray1 The first character array.
     * @param theArray2 The second character array.
     * @param thePosition The position starting from which the calculation will be done.
     * @return positive number when the Array1 is greater than Array2<br/>
     *         negative number when the Array2 is greater than Array1<br/>
     *         zero when the Array1 is equal to Array2
     */
    private int sortAsNumber(char[] theArray1, char[] theArray2, int thePosition)
    {
        int aResult = 0;
        int aNumberInStr1;
        int aNumberInStr2;
        if (thePosition < theArray1.length && thePosition < theArray2.length)
        {
            if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray1[thePosition]))
            {
                aNumberInStr1 = getNumberInStr(theArray1, thePosition);
                aNumberInStr2 = getNumberInStr(theArray2, thePosition);

                aResult = aNumberInStr1 - aNumberInStr2;

                if (aResult == 0)
                {
                    thePosition = getNonDigitPosition(theArray1, thePosition);
                    if (thePosition != -1)
                    {
                        aResult = sortAsString(theArray1, theArray2, thePosition);
                    }
                }
            }
            else
            {
                aResult = sortAsString(theArray1, theArray2, ++thePosition);
            }
        }
        else
        {
            aResult = theArray1.length - theArray2.length;
        }
        return aResult;
    }

    /**
     * Gets the position of the non digit character in the given array starting from the given
     * position.
     * 
     * @param theCharArr /the character array.
     * @param thePosition The position after which the array need to be checked for non-digit
     *        character.
     * @return The position of the first non-digit character in the array.
     */
    private int getNonDigitPosition(char[] theCharArr, int thePosition)
    {
        for (int i = thePosition; i < theCharArr.length; i++ )
        {
            if ( !Character.isDigit(theCharArr[i]))
            {
                return i;
            }
        }
        return -1;
    }

    /**
     * Gets the integer value of the number starting from the given position of the given array.
     * 
     * @param theCharArray The character array.
     * @param thePosition The position form which the number need to be calculated.
     * @return The integer value of the number.
     */
    private int getNumberInStr(char[] theCharArray, int thePosition)
    {
        int aNumber = 0;
        for (int i = thePosition; i < theCharArray.length; i++ )
        {
            if(!Character.isDigit(theCharArray[i]))
            {
               return aNumber;
            }
            aNumber += aNumber * 10 + (theCharArray[i] - 48);
        }
        return aNumber;
    }
}
Kannan Ramamoorthy
  • 3,980
  • 9
  • 45
  • 63