0

I had done several different searches on SO looking for a simple solution to sorting mySQL results alphanumerically where some fields may have special characters present. The solution:

"SELECT *, REPLACE(title '\"', '') AS indexTitle ORDER BY indexTitle ASC";

In this case I'm searching for strings that begin with a double quote, escaped.

This probably wouldn't be a great solution where the types of special characters are not known, but for a simple sort it works nicely.

Hopefully this helps someone.

cbmtrx
  • 591
  • 8
  • 16
  • Just to be clear - you want, e.g., to sort the string `hello!`, `**world@#`, and `mysql$$$` as if they were `hello`, `world` and `mysql`, ignoring all non alpha-numeric characters? – Mureinik Nov 16 '15 at 18:35
  • 1
    But this "solution" does not actually sort the string in the correct order. It sorts by an order according to strings that have been altered. I would call this a primitive "workaround". And ask the question why you don't take care instead to have clean data in your database. Keep in mind how slow such a sorting strategy will perform. This cannot scale, since you have to make the replacement again and again for each `SELECT` statement and for each row. Imagine this for millions of rows! – arkascha Nov 16 '15 at 18:37
  • why is that a problem? you do a replace(), and order by that replace(). if you don't want to stort by the "new" version of the string, then don't tell the db to sort on that "new" version. – Marc B Nov 16 '15 at 18:39
  • @Mureinik If I have some fields beginning with alpha and others with quotes, the quotes are sorted first. Same thing, incidentally, with things like "A-O" or "A. something" preceding "AAA", for example. – cbmtrx Nov 16 '15 at 20:20
  • @arkascha Ah, would that it were all so easy! You must surely know by now that there are many, many reasons for data to be formatted a certain way. If you care to know why: in this particular instance, these are indexes for a publication; indexes can be names, words, things, concepts...and phrases, which appear in quotes. Many can also be qualified with extra details in parentheses. Thus, each index is formatted on its own merits for its own reasons. Sure, you could probably explode this into a whole different structure that would negate this problem, but it would be far more complicated. – cbmtrx Nov 16 '15 at 20:25
  • @cbmtrx Setting up a sane structure might be a bit more complex in the start, indeed. But it is required for an implementation that is robust and scales and certainly will safe you a lot of hassles later on. What is typically done in such situations is to create a separate "sorting index" which is updated when new entries are created. That sorting index holds the sortable variants of the actual strings. The creation of entries in such additional index can even be done in an automated manner within the database engine by means of triggers. Same applies for search indexes, btw. – arkascha Nov 16 '15 at 20:38
  • @arkascha In this case the problem itself is so minor (one quote) and the solution so simple that nothing else is needed. But this is good to know for the future. – cbmtrx Nov 17 '15 at 15:23

1 Answers1

0

One way to do this would be to write your own function to strip non-alphanumeric characters from a String. Google found me this example (I've not checked it!). Then you could write something like:

SELECT *, remove_non_alphanum_char_f(title) AS indexTitle ORDER BY indexTitle ASC;

Though of course as @arkascha has pointed out in the comments above this is slow and not scalable. A better solution is to go back a step and, if possible, ensure the data in your table is in the correct format to begin with. If you really need the special characters, it may be less of an overhead to add an extra column to your table which is the title column with the special characters stripped - then you could just order by that column. You could perform the stripping at the point when you insert into the table.

James K
  • 89
  • 1
  • 10
  • Yes, if my data had been any more complex I almost certainly would have gone that route. Like this answer here: http://stackoverflow.com/questions/6942973/mysql-how-to-remove-all-non-alpha-numeric-characters-from-a-string But for this particular circumstance I needed a simple method for removing one character, if present, and this method works nicely. – cbmtrx Nov 16 '15 at 20:29