0

The standards Delphi RTL string comparison routines compare strings by ASCII ordering.

As far as I concern, lexicographic ordering is based only on the letters of the alphabet; it is the ordering system used in dictionaries.

Is there a Delphi native function to compare strings by lexicographical order? For now, I don't need a complex solution to support alphabets other than English.

UPDATE

I don't know the detail rule about lexicographical ordering system, but I know one of the rule, that is, this ordering system will treat for example, a-b is greater than aa. However, it was based on my observation on the English dictionaries in my hand. Perhaps, there are still rules that I'm not concerned.

Astaroth
  • 2,241
  • 18
  • 35
  • 1
    So, is this a Delphi (ANSI or Unicode ?) or FPC (ANSI) question ? You're talking about Delphi all the time but finally used FPC tags. Also, which platforms do you target (there is a set of functions for what you need in MSVCRT, but you'd need to target Windows) ? – TLama Jun 22 '14 at 07:55
  • It is not matter whether it's Delphi or FPC, Ansi, Unicode, or UTF, because I just want to look into code implementation, then modify it to suite my need. I'm talking about English alphabets, not alphabets in other languages. – Astaroth Jun 22 '14 at 08:07
  • You were talking about native functions, so MSVCRT's [`*coll`](http://msdn.microsoft.com/en-us/library/a7cwbx4t.aspx) functions were the first I've hit. But those you can use only on Windows platform. – TLama Jun 22 '14 at 08:32
  • 1
    English alphabet have no collation issues, and lexicographical string ordering is the same as case-insensitive ordering; the `AnsiCompareText` function is doing case-insensitive string comparison. – kludg Jun 22 '14 at 08:47
  • Do you mean you want your comparison function to return that `'a'` < `'123456789 b'` < `'c'` because the characters before the `'b'` aren't letters? If not, can you edit your question to give some concrete examples of how Delphi's own RTL functions don't meet your needs? –  Jun 22 '14 at 09:15
  • For simple definitions [see here](http://stackoverflow.com/questions/6810619/how-to-explain-sorting-numerical-lexicographical-and-collation-with-examples) – Jan Doggen Jun 22 '14 at 10:26
  • Your question makes no sense. ASCII only has English letters anyway. Did you really mean ASCII? What's more, the English letters appear in ASCII in alphabetical order. I think you need to step back and work out what you are actually looking for. – David Heffernan Jun 23 '14 at 06:46
  • @David Heffernan, I don't know the detail rule about lexicographical ordering system, but I know one of the rule, that is, this ordering system will treat for example, a-b is greater than aa. However, it is based on my observation on the English dictionaries in my hand. Perhaps, there is still rule that I'm not concerned. That is why I ask about it here. – Astaroth Jun 23 '14 at 15:45

1 Answers1

3

The AnsiCompareText function is doing case-insensitive string comparison taking the collation order of the system locale into account.

Just to be sure I've run the following test on a system with 1251 codepage:

procedure Test;
var
  S1, S2: string;

begin
  S1:= 'им';
  S2:= 'ём';
  Writeln(IntToHex(Integer(S1[1]), 4));    // 0438
  Writeln(IntToHex(Integer(S2[1]), 4));    // 0451
  Writeln(AnsiCompareText(S1, S2));        // 1 (means S1 > S2)
end;

You can see that the letter 'и' has the code (0x0438) less than the letter 'ё' (0x0451) but in Russian alphabet 'ё' precedes 'и', and AnsiCompareText function compares 'ё' and 'и' according to the rules of Russian alphabet, not according to the numeric values of their codes.

kludg
  • 27,213
  • 5
  • 67
  • 118