5

I am currently reading about the std::next_permutation function and encountered the term "lexicographical order". At the particular time, I had no experience with this term whatsoever so did a google search for this and found only somewhat cryptic definitions of this type of order, including the wiki article (at least they are to me).

So could anybody try to help me understand this? What is a "good" definition of this term to you?

Regarding the wiki article - they claim that lexicographical order is also known as alphabetical order but as I continue reading, I get the understanding that they are not the same. Thus, the ongoing comparison confuses me a bit.

  • Handy reading: http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare – user4581301 Nov 24 '17 at 19:16
  • That you found the Wikipedia article "cryptic" is not exactly a helpful problem description. Stack Overflow is for specific problems, not for competing with Wikipedia articles on general computer-science subjects. – Christian Hackl Nov 24 '17 at 19:18
  • I've given it some thought, and I believe the Wikipedia article really could use some rewording. It should probably emphasise the generalisation aspect before introducing "alphabetical order" as a synonym. – Christian Hackl Nov 24 '17 at 19:30
  • @step If your question pertains to permutations of characters then I hope the linked question resolves it for you. Otherwise, please [edit] to clarify so we can re-open it to answer your question in relation to `std::next_permutation`. – Tom Blodget Nov 24 '17 at 23:16

3 Answers3

12

In normal English usage, when we sort words alphabetically, we employ two rules:

  • If two words have the same first letter, we compare the second. If the second letters are the same, we compare the third, etc. Finally, one word comes before the other if the first differing letter comes before the corresponding letter.

  • If two words are identical up to the length of the shorter word, the shorter word comes first.

So "Tom" comes before "Tooth". The first letters are identical ("T"), the second letters are identical "o", but the third letters diff and "m" comes before "o". Therefore "Tom" comes before "Tooth".

"Tom" comes before "Tomas" because the two words are identical through the first three letters "Tom" and "Tom" is shorter than "Tomas".

Lexicographic order is simply alphabetic ordering, generalized for non-letter values. Consider a sequence of values, not necessarily letters:

(1,5,10) comes before (1,6,3) because "5" comes before "6".

(1,5,10) comes before (1,5,10,15,20) because (1,5,10) is shorter than (1,5,10,15,20).

Lexicographic ordering is particularly useful if the elements of the sequence have some specific meaning, with the earlier values giving a higher precedence. For example, consider these times: 9:13 AM and 8:25 AM. If we represent these with the sequence (9,13) and (8,25), then (8,25) comes before (9,13) because 8 comes before 9. What if the hours are the same? For example, (9,13) comes before (9,45) because 13 comes before 45. As you can see, lexicographic ordering allows the hour field to have a higher precedence than the minute field.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Thanks, nice answer! So nothing would be wrong with saying that lexical order simply is alphabetical order with numbers included? Also, are unicode characters also applicable in an lexicographical ordering? –  Nov 24 '17 at 20:22
  • @step. I would say there is something wrong with it. In terms of characters, "The Alphabet" is an ordered subset of letters from a writing system for a language established by a language academy or convention. So it entirely depends on the language and a chosen script for it. If you were to specify these then you would be defining a lexicographic ordering (for those letters) Normally, you would just specify a character set, which has a codepoint order, or a character encoding, which has a code unit order. In the case of Unicode, not always the same order: >$but "\uD83D\uDEB2"<"\uFF04". – Tom Blodget Nov 27 '17 at 01:55
3

Most of the out of the box string sorting algorithms are implemented as lexicographic sort. (More Details at the Bottom)

Example 1:

Random Elements:

['A','a','a','B','b','C','c','d','E']

Sorted with lexicographic ordering:

['A','B','C','E','a','a','b','c','d']

Example 2:

Random Elements with different length:

['a', 'b', 'aa', 'c', 'ddd', 'f']

Sorted with lexicographic ordering:

['a', 'aa', 'b', 'c', 'ddd', 'f']

Difference between lexicographic and natural sort

input = ["z1.txt", "z10.txt", "z3.txt", "z100.txt", "z101.txt"]

lexicogrpahic : ['z1.txt', 'z10.txt', 'z100.txt', 'z101.txt', 'z3.txt']
natural: ['z1.txt', 'z3.txt', 'z10.txt', 'z100.txt', 'z101.txt']

We could go into details here, but a lot of great people already created great explanations for that:

1) Does Python have a built in function for string natural sort?

2) https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

user1767754
  • 23,311
  • 18
  • 141
  • 164
2

In layman's terms, this means alphabetic order. In practice, you'd be sorting the strings character by character, according to their underlying numerical (usually ASCII) representation.

Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • "underlying numerical representation" - could this be the bit representation? –  Nov 24 '17 at 19:24
  • @step yup, that's it – Mureinik Nov 24 '17 at 19:25
  • Ahhh thanks! This is a very comprehensible definition. Is this implied/stated anywhere in the wiki article? I cannot find it. –  Nov 24 '17 at 20:00
  • Yeah, except in the context of permutations, "alphabet" has the [mathematical meaning](https://www.encyclopediaofmath.org/index.php/Alphabet) so the result is not really in layman's terms. And, it's a special case that the set of objects be characters and an even more special case that the characters be encoded in ASCII. But, if it helps in the progression of understanding, that's great. – Tom Blodget Nov 24 '17 at 23:10
  • @TomBlodget Thanks for the input. So are you claiming that the sets of objects to be arranged in order are usually not characters? What else would it be? I mean both letters and numbers are considered characters, and if so, what else would you arrange? –  Nov 25 '17 at 00:08
  • @step A digit is a character. Most numbers cannot be represented by one character. I don't think I've ever used permutations of characters (except when playing Scrabble, creating anagrams, etc). And, although there are over ten thousand letters in the computer world, we might want some permutations from a larger set. The C++ standard library is very meticulous about separating algorithms from values. `std::next_permutation` basically requires a bidirectionally iterable sequence of objects; You can use the `operator <` for two objects, if the type defines one, or you can provide your own. – Tom Blodget Nov 25 '17 at 19:10
  • @seth My most recent use of permutations was over a multi-set of box sizes. (A multi-set is a set of values that can be repeated. [std::multiset](http://www.cplusplus.com/reference/set/multiset/)) – Tom Blodget Nov 25 '17 at 19:13
  • @TomBlodget Thank you! –  Nov 27 '17 at 11:13