0

What I would like to do is to sort a list of textual identifiers (think of a file name for example). The sorting algorithm I'm looking for is a kind of alphabetic sorting, but taking groups into account.

For example, sorting "D1" to "D21" should be:

D1, D2, D3, ..., D21

And not:

D1, D10, D11, D12, ... D2, D20, D21, D3, ...

I've been a lot of time trying to accomplish such way of sorting but still can't find how to do it. The bigger the group the harder it seems.

Is there anyone that could guide me through or give me some pseudo code or code in any language?

Thanks.

Shantia
  • 1,129
  • 2
  • 9
  • 9

5 Answers5

3
  • Detect the position the strings differ first.
  • If both values at that position are numbers:
    • read forward, one character at at time, until you find a non-number in either string.
    • If only one string is non-numeric, it is "smaller"
    • If Both strings are non-numeric at the same point, compare numbers at the earlier detected position
  • Otherwise
    • compare alphabetically the letters at the position
gnarf
  • 105,192
  • 25
  • 127
  • 161
2

Jeff has a post on this with links to several implementations. It appears to be called "natural sorting".

See also this question.

Cœur
  • 37,241
  • 25
  • 195
  • 267
John Fouhy
  • 41,203
  • 19
  • 62
  • 77
1

The Unix sort utility has a 'numeric sort' option -n which sorts the way you want, rather than a plain dictionary sort.

Maybe look for information about 'numeric sort'

The Unix source code of sort will be available, but probably difficult to follow.

pavium
  • 14,808
  • 4
  • 33
  • 50
1

PHP: the nat*sort family of functions (natsort, natcasesort, ...)

Perl:

sub natcmp {
    # from http://www.perlmonks.org/?node_id=540890
    my @a = split /(\d+)/, (shift or $a);
    my @b = split /(\d+)/, (shift or $b);

    my $last = min(scalar @a, scalar @b)-1;
    my $cmp;
    for my $i (0 .. $last) {
        unless($i & 1) {  # even
            $cmp = lc $a[$i] cmp lc $b[$i] || $a[$i] cmp $b[$i] and return $cmp;
        } else {  # odd
            $cmp = $a[$i] <=> $b[$i] and return $cmp;
        }
    }
    return scalar @a <=> scalar @b;  # shortest array comes first
}

# ...

@sorted_array = sort natcmp @array;

What you're looking for is called a natural sort.

We Are All Monica
  • 13,000
  • 8
  • 46
  • 72
1

Others have pointed out that it's called "natural sort". For code, this Python implementation is the simplest I've found. It basically uses a regex to split each string into heterogeneous lists of ints or strings, and then compares those.

This takes advantage of a couple of Python features w.r.t. object comparison. First, that you can compare ints and strings directly (ints sort as less-than any string). Second, that you can compare heterogeneous lists (it compares elements from the beginning until it finds a difference).

Ken
  • 1,066
  • 1
  • 10
  • 17