3

Firstly I want to point out that I have tried almost everything. I am trying since last 8 hours to make my list in order, and I have applied dozen of solutions found here.

Here is SQL Fiddle with the sample data. I have found a page that manages to sort my list in the right order, and that is:

1
2
2.B3
5
9
10 A-1
10 A-3
10 B-4
10 B-5
11
12
B3-43
B3-44
B3 - 48
B3 - 49
Basztowa 3
Basztowa 4
Basztowa 5
Basztowa 7
Basztowa 9
D.1
D.2
D.10
D.11
D.12
Kabaty ul. Pod lipą 4

But I am not able to reproduce this using MySQL.

I would appreciate any help as I have no more ideas. I consider using PHP to sort my list but as far as I know DBMS are optimized for this kid of operations so if it's possible I would like to avoid doing this using PHP.

@UPDATE

Thanks to @Jakumi I have created two functions that helps me to solve my problem.

You need to create a column to store your values in sort-friendly format (zeropadded_name), create trigger on update and insert to fill zeropadded_name when name changes and that's all! Now just order by zeropadded_name and enjoy!

Helper functions

  1. regex_replace - Its task is to help us sanitize value by removing all non-alphanumeric characters.
  2. lpad_numbers - pads every number in our string. It's a bit ugly, as I don't know MySQL functions much, but hey, it works, quite fast.

Example:

SELECT lpad_numbers(regex_replace('[^a-zA-Z0-9]', ' ', 'B3 - A-5'));
#B0003A0005

DROP FUNCTION IF EXISTS regex_replace;
CREATE FUNCTION `regex_replace`(
  pattern     VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci,
  replacement VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci,
  original    VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci
) RETURNS varchar(1000) CHARSET utf8
    DETERMINISTIC
BEGIN
    DECLARE temp VARCHAR(1000)
    CHARSET utf8
    COLLATE utf8_polish_ci;
    DECLARE ch VARCHAR(1)
    CHARSET utf8
    COLLATE utf8_polish_ci;
    DECLARE i INT;
    SET i = 1;
    SET temp = '';
    IF original REGEXP pattern
    THEN
      loop_label: LOOP
        IF i > CHAR_LENGTH(original)
        THEN
          LEAVE loop_label;
        END IF;
        SET ch = SUBSTRING(original, i, 1);
        IF NOT ch REGEXP pattern
        THEN
          SET temp = CONCAT(temp, ch);
        ELSE
          SET temp = CONCAT(temp, replacement);
        END IF;
        SET i = i + 1;
      END LOOP;
    ELSE
      SET temp = original;
    END IF;
    RETURN temp;
  END;

DROP FUNCTION IF EXISTS lpad_numbers;
CREATE FUNCTION `lpad_numbers`(str VARCHAR(256)) RETURNS varchar(256) CHARSET utf8 COLLATE utf8_polish_ci
BEGIN
    DECLARE i, len SMALLINT DEFAULT 1;
    DECLARE ret VARCHAR(256) DEFAULT '';
    DECLARE num VARCHAR(256) DEFAULT '';
    DECLARE c CHAR(1);

    IF str IS NULL
    THEN
      RETURN "";
    END IF;

    SET len = CHAR_LENGTH(str);
    REPEAT
      BEGIN
        SET c = MID(str, i, 1);
        IF c BETWEEN '0' AND '9'
        THEN
          SET num = c;
          SET i = i + 1;
          REPEAT
            BEGIN
              SET c = MID(str, i, 1);
              SET num = CONCAT(num, c);
              SET i = i + 1;
            END;
          UNTIL c NOT BETWEEN '0' AND '9' END REPEAT;
          SET ret = CONCAT(ret, LPAD(num, 4, '0'));
        ELSE
          SET ret = CONCAT(ret, c);
          SET i = i + 1;
        END IF;
      END;
    UNTIL i > len END REPEAT;
    RETURN ret;
  END;
Kamil Latosinski
  • 756
  • 5
  • 28
  • so you want to get the data out of the database in alphabetical order? – Mark Twigg Aug 18 '16 at 15:30
  • Well, we can call this alphabetical order. Regular order by name doesn't work cause 2 goes after 11, etc. – Kamil Latosinski Aug 18 '16 at 15:35
  • What's the common pattern that the data share? There are dots/hyphen/space etc. – 1000111 Aug 18 '16 at 15:37
  • 1
    Possible duplicate of [MySQL 'Order By' - sorting alphanumeric correctly](http://stackoverflow.com/questions/8557172/mysql-order-by-sorting-alphanumeric-correctly) – Jester Aug 18 '16 at 15:37
  • The point is, your order is not really alphanumeric (lexicographic, actually), since 10 must technically come after 1, since they are all strings. Also the page you linked does put `B3 - 48` before `B3-43` – Jakumi Aug 18 '16 at 15:38
  • from @Jester s link: `ORDER BY CAST(name AS UNSIGNED), name` ... however, there are of course edge cases where it absolutely won't work. What you would have to do is extract something that can actually be sorted. like ... make multiple columns num1, alpha1, num2, alpha2, ... and sort successively ... – Jakumi Aug 18 '16 at 15:40
  • `order by LENGTH(replace(' ', '', field)), replace(' ', '', field)` – Mark Baker Aug 18 '16 at 15:41
  • Well all non-alphanumeric chars can be replaced with space, so we can skip this issue. – Kamil Latosinski Aug 18 '16 at 15:42
  • @Jakumi i tried it, it prefers characters over numbers however – Jester Aug 18 '16 at 15:43
  • 1
    @KamilLatosinski well, when they are replaced with space, then there's still the issue of space vs. non-space and still that 10 is < 2 in lexicographic/alphabetic order. I really think you should generate another column which contains a "sanitized" version of your names that can be sorted correctly. – Jakumi Aug 18 '16 at 15:47
  • You want something like [this](http://sqlfiddle.com/#!9/968fe/82/0) @KamilLatosinski? – frz3993 Aug 18 '16 at 16:03
  • @frz3993 this will probably fail with "D1A" < "D10" which the query would turn to "D10" < "D1A". this specific line is not included in op's example ... but nevertheless – Jakumi Aug 18 '16 at 16:21
  • @Jakumi, seems it is easier using PHP with something like `natsort()`. – frz3993 Aug 18 '16 at 16:57

2 Answers2

2

splitting according to underlying structure

Technically, the mysql sorting mechanism works correctly but your strings are formatted in the wrong way. The underlying structure of your data is something like the following (Original column kept for ease of association to the example):

alpha1   num1 alpha2 num2 ...   Original      
            1                   1             
            2                   2             
            2      B    3       2.B3          
            5                   5             
            9                   9             
           10      A    1       10 A-1        
           10      A    3       10 A-3        
           10      B    4       10 B-4        
           10      B    5       10 B-5        
           11                   11            
           12                   12            
B           3          43       B3-43         
B           3          44       B3-44         
B           3          48       B3 - 48       
B           3          49       B3 - 49       
Basztowa    3                   Basztowa 3    
Basztowa    4                   Basztowa 4    
Basztowa    5                   Basztowa 5    
Basztowa    7                   Basztowa 7    
Basztowa    9                   Basztowa 9    
D           1                   D.1           
D           2                   D.2           
D          10                   D.10          
D          11                   D.11          
D          12                   D.12          

If you would sort them now with ORDER BY alpha1, num1, alpha2, num2 they would be sorted as you want them. But the already "formatted" version (the Original column) cannot be sorted easily, because the parts that shall be sorted alphabetically and the parts that shall be sorted numerically are mixed together.

zeropadding

There is a somewhat less extensive alternative needing only one extra column where you assume no number ever goes beyond let's say 10000 and you can now replace every number (not digit!) with a zero-padded version, so 10 A-1 would become 0010A0001 (which is 0010 and A and 0001, obviously), but I don't see this being made on-the-fly in an ORDER BY statement.

But for this example, the zeropadded version (Assumption: every number < 10000):

Original      Zeropadded 
1             0001       
2             0002       
2.B3          0002B0003  
5             0005       
9             0009       
10 A-1        0010A0001  
10 A-3        0010A0003  
10 B-4        0010B0004  
10 B-5        0010B0005  
11            0011       
12            0012       
B3-43         B00030043  
B3-44         B00030043  
B3 - 48       B00030048  
B3 - 49       B00030049  
Basztowa 3    Baztowa0003
Basztowa 4    Baztowa0004
Basztowa 5    Baztowa0005
Basztowa 7    Baztowa0007
Basztowa 9    Baztowa0009
D.1           D0001      
D.2           D0002      
D.10          D0010      
D.11          D0011      
D.12          D0012      

This would be sortable to your wishes with ORDER BY zeropadded.

So in the end, you probably have to sort in php or create more columns that help you sort via reformatting/sanitizing/splitting your input.

update

zeropadding explained (simplified)

The main idea behind zeropadding is that the natural format of numbers is different from their format in the computer. In the computer the number 2 is effectively the sequence of digits 0..0002 (so the leading zeros are included) similar 10 (0..0010). When the computer compares numbers, it will go from left to right until it finds different digits:

0...0002
0...0010
======!.    (the ! marks the point where the first digit is different)

And then it will determine which digit is bigger or smaller. In this case 0 < 1, and therefore 2 < 10. (Of course the computer uses binary, but that doesn't change the idea).

Now, a string is technically a sequence of characters. String comparison works slightly differently. When two strings are compared, they are not (left) padded, so the first character of each string is really the first character and not a padding (like a space for example). So technically the string A10 is a sequence of characters A, 1 and 0. And since the string comparison is used, it is "smaller" than A2, because the string comparison doesn't see the numbers as numbers but as characters (that are digits):

A10
A2
=!     (the ! marks the point where the first character is different)

and because 1 < 2 as characters, A10 < A2. Now to circumvent this problem, we force the format of numbers in the string to be the same as it would be in numerical comparisons, by padding the numbers to the same length which is aligning the digits according to their place value:

A0010
A0002
===!.  (the ! marks the point where the first character is different)

Now it's effectively the same comparison you would expect in numerical comparisons. However, you have to make some assumption about the maximal length of numbers, so that you can choose the padding appropriately. Without that assumption, you'd have a problem.

The only (logical) point that remains: When the compared string has an alphabetical character where the other has a number, what does the padding change? The answer is: Nothing. We don't change numbers into letters, and numbers are smaller than letters, so everything stays in the same order in that case.

The effect of zeropadding is: We adjust the "number" comparison in strings to be similar to the real number comparison by aligning the digit characters according their value.

Community
  • 1
  • 1
Jakumi
  • 8,043
  • 2
  • 15
  • 32
  • I have adjusted your solution to my needs. As my true real estate name is both building symbol - name it would be nightmare for me to do this using DBMS, so I have created php script that populates this column with desired name and from now on on every update I will calculate new zeropadded name. Works like a charm! Can you briefly explain why this works or link to some reading on this topic? – Kamil Latosinski Aug 19 '16 at 06:08
  • 1
    @KamilLatosinski I added a somewhat lengthy explanation ;o) – Jakumi Aug 19 '16 at 07:32
-1
SELECT name FROM realestate ORDER BY name ASC;

This should sort your list in alphanumeric data... I don't see the issue.

EDIT: OK, I still don't know if I really understood what is the goal of this issue (is it for a contest?), but I can submit this "twisted" query (that I hope I will never use in my career):

SELECT name FROM realestate
ORDER BY IF(SUBSTRING(name, 1, 2) REGEXP '[A-Z]', 100000, CAST(name AS UNSIGNED)) ASC,
SUBSTRING(name, 1, 2) ASC,
CAST(SUBSTRING(name FROM LOCATE('.', name)+1) AS UNSIGNED) ASC,
REPLACE(name, ' ', '') ASC;

Maybe someone can find an easier way, because I admit my answer is a bit complicated. BUT, Kamil and Jakumi solutions are much more tricky and complicated.

FragBis
  • 73
  • 6
  • gives `10` before `2` – Mark Baker Aug 18 '16 at 15:42
  • this doesn't provide the results that he gave in his question. – Jester Aug 18 '16 at 15:45
  • in alpha order 10 is before 2 because 1 is before 2. You have to cast the column into INT or DECIMAL intending to have 2 before 10 – FragBis Aug 18 '16 at 15:46
  • I've edited my answer with a query (the 2nd one) that gives exactly the order he gave in his question and only with ONE mysql query, ordering the result. I also think that the answer Jakumi gave is way more complicated. But as you wish. – FragBis Aug 22 '16 at 15:00