1

I have an algoritm sorting words in alphabetical by the letters value, this all works fine until I include å ä ö as they return a int value ranging from -103 to -124. Becuse of this the order of the words are like this ä å ö a for example, when it should be a å ä ö. So how do I make it sort it correctly with å ä ö last?

Edit: Im not allowed to use fancy functions, that is why this code is so bare boned, also using using namespace std

My code:

pali is a vector of type string that I use to store the words

void SortPal() {
int antal = pali.size();
string tempO;
bool byte = false;

for (int i = 0; i < antal - 1; i++) { //går igenom alla ord i vectorn
        if (int(pali[i][0]) > int(pali[i + 1][0])) {
            tempO = pali[i];
            pali[i] = pali[i + 1];
            pali[i + 1] = tempO;
            i = -1;
        }
        else if (int(pali[i][0]) == int(pali[i + 1][0])) { //Om första bokstaven är samma kollar den följande
            int minsta = pali[i].size();
            if (minsta > pali[i + 1].size()) {
                minsta = pali[i + 1].size();
            }
            for (int a = 1; a < minsta-1; a++){
                if (int(pali[i][a]) > int(pali[i + 1][a])) { //byter om någon av bokstäverna efter den första är mindre än bokstäverna i andra ordet
                    tempO = pali[i];
                    pali[i] = pali[i + 1];
                    pali[i + 1] = tempO;
                    i = -1;
                    byte = true;
                    break;
                }
            }
            if (byte == false && pali[i].size() > pali[i + 1].size()) { // byter om pali i+1 är mindre än pali i
                tempO = pali[i];
                pali[i] = pali[i + 1];
                pali[i + 1] = tempO;
                i = -1;
            }
        }
}

}

Rama
  • 3,222
  • 2
  • 11
  • 26
  • There are three places in this code alone that you have repeated swapping logic. That's nine lines that can be replaced by three calls to `std::swap`. I also see a reimplementation of `std::min` in there. – chris Apr 10 '17 at 13:01
  • Have you looked at the input given here: http://stackoverflow.com/q/4611302/1025391 ? – moooeeeep Apr 10 '17 at 13:01
  • 1
    Also see: http://stackoverflow.com/questions/1357374/locale-dependent-ordering-for-stdstring – NathanOliver Apr 10 '17 at 13:02
  • im not allowed to use some algoritms for this assignment, it should be as raw as possible. otherwise i would just have used sort – Andre Nordlund Apr 10 '17 at 13:31
  • every program must be programmed using some algorithms. Without algorithms how can it run? – phuclv Apr 10 '17 at 13:54
  • You are going to want a function that compares two letters according to the rules of your language (swedish?). Numerically comparing the encoding only (half) works in english. As a bonus, such a function can be used as the final argument to std::sort – Caleth Apr 10 '17 at 13:59
  • Unless your instructor specifically wants you to support these letters in this specific order you mention, in this specific encoding, **don't bother**. – n. m. could be an AI Apr 10 '17 at 14:53
  • Sadly he did, i would much rather add something stopping the input of these characters which would be much easier. – Andre Nordlund Apr 10 '17 at 16:00

2 Answers2

2

Generally speaking, there's no relationship between the alphabetical order of letters in any given language and numerical codes assigned to said letters in any given character set. In order to compare strings according to the alphabetical order of a given language (or more generally the collation order of the current locale), C has a special function called strcoll.

In order to use it, you need to set up your locale accordingly. Unfortunately, locale names are not standard in C. If you are on Windows, the linked example is unlikely to work.

This is what you should be using in real software. It matters little for you assignment since you are not supposed to use fancy library functions. You need to implement a function similar to strcoll yourself, and it should only work for your language.

In a language where each character has its own place in the alphabet, this function is simple: write a function that takes a character and returns its place in the alphabet (e.g. for 'a' return 1, for 'b' return 2, ..., for 'å' return 27, for 'ä' return 28...) Compare the strings according to numbers returned by this function. This may or may not take into account letter case depending on what exact sort order you want.

If you don't want to write a big switch, you can use the fact that letters that are in ASCII are already ordered as you want, you only need to fix the order of three additional letters. So you can write something like this:

int collation_order(int ch) {
  switch (ch) {
     case 'Å':  return 'Z'+1;
     case 'å':  return 'z'+1;
     case 'Ä':  return 'Z'+2;
     case 'ä':  return 'z'+2;
     case 'Ö':  return 'Z'+3;
     case 'ö':  return 'z'+3;
     default :  return ch;
  }
}

int my_strcoll (char* p, char* q)
{
  int pp, qq;
  while (*p && (pp=collation_order(*p)) == (qq = collation_order(*q))) {
    p++; q++;
  }
  return pp - qq;
}

Of course this means that non-alphabetic that come after Z/z in the ASCII table will get sorted incorrectly. If you want to sort those after Ö/ö, you need to extend collation_order accordingly. Try doing this without resorting to a case for each individual character.

Another way to write collation_order is to use character codes (cast to unsigned char) as indices in an array of 256 integer elements.

Also please note that old 8-bit encodings are old and should not be used for serious new development. For more information, read this.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
-1

Since your options are constrained and you can also constrain your input to a foreseeable universe, I'd suggest you to use a simple parser function to fit non-ASCII characters inside the places you know they should:

int parse_letter( int source )
{
    switch( source )
    {
        case 'å':
        case 'ä':    return 'a';
        case 'ö':    return 'o':
        // as many cases as needed...
        default:     return source;
    }
}
j4x
  • 3,595
  • 3
  • 33
  • 64
  • I asked my teacher if i just could add code to stop the input of å ä ö and he said i should rather make the sorting function work correctly with them. But wont this just return the value of a and o, in that case å ä ö wont even end up in the list? – Andre Nordlund Apr 10 '17 at 16:00
  • You can use the parser function only during comparisons and store the original data once classified. This way you'll not lose any information. – j4x Apr 10 '17 at 16:10
  • The correct order is probably a, b, ..., z, å, ä, ö. – n. m. could be an AI Apr 11 '17 at 10:43
  • Why would you consider "å" and "ä" as an "a"? – Phil Feb 25 '22 at 23:32
  • Hi @Phil. I know that Germanic languages handle those characters as separate letters but this is not my choice to flatten them as done for Latin languages. You can find multiple references around. E.g: https://ux.stackexchange.com/a/115620 https://english.stackexchange.com/a/212630 https://en.wikipedia.org/wiki/Alphabetical_order Nevertheless, the function I suggested is simple and flexible enough to return anything you want. If you are not happy to map your characters removing diacritics, you can simply return something like `'z' + 1`, `'z' + 2`, `'z' + 3`. – j4x Mar 02 '22 at 12:17