1

I'm using php's natural sorting algorithm natsort but I have a consideration about memory usage.

This is how it goes. Script pulls data from mysql and put results into an array and than apply natsort over it. But here is the catch. Row's text can be long and there could be hundreds of rows.

Example code:

$array = array();
while ($row = $db->getResults()) {
  $array[$row->code] = $row->text;
}

if (empty($array)) {
  uksort($array, "strnatcmp");
}

I wonder how is this affecting memory? Is this appropriate approach or should I do something more efficient, more memory pleasant?

Borut Tomazin
  • 8,041
  • 11
  • 78
  • 91

2 Answers2

2

One thing you can do is store a new column which duplicates the column you want to sort, but stores it in a transformed format that will sort naturally when using the regular sort algorithm.

Conceptually, you can do this by left-padding digit sequences with zeros to a length that will be as long as the longest possible numeric sequence that could occur in your string.

My solution isn't totally rugged, but if your strings just have digit sequences of known maximum lengths, then you can left pad them with zeros to that known max length. For example, if you had cd track titles with the track number embedded into the title like:

1 Foo
2 Bar
...
10 Baz

Maybe you decide that the longest numeric sequence possible would be 3(999 possible tracks), so you would pad the numeric sequences like

001 Foo
002 Bar
...
010 Baz

This works fine with strings that have multiple numeric sequences.

Sample php code, although you could write a mysql stored function to do this, and then use insert and update triggers on the table so that it's maintained transparently.

$input = 'a44b1c399d4';
$nat = preg_replace_callback('#\d+#', function($m) {
    return str_pad($m[0], 3, '0', STR_PAD_LEFT);
}, $input);
echo $nat; // a044b001c399d004

Then just sort in mysql via

order by natsort_column

This also lets you put an index on that column, giving you good sort performance.

goat
  • 31,486
  • 7
  • 73
  • 96
  • If I am correct than I should get results from mysql, add another column to the output and again query sql with that field? Could you give an example of mysql procedure to apply this code in first query? – Borut Tomazin Jan 27 '13 at 10:11
  • @rambo.coder: Do you have an example of the above code for mysql also? – Borut Tomazin Jan 27 '13 at 16:55
  • No, but you can write it simple enough using a stored fuction. just use a loop in it, with mysql string functions like locate() concat() substr() – goat Jan 27 '13 at 17:02
  • @rambo.coder: Thanks, but I just found a solution: http://stackoverflow.com/a/12257917/384864 – Borut Tomazin Jan 27 '13 at 17:20
  • @rambo.coder: Oh and your solution breaks if you do not have structured columns. – Borut Tomazin Jan 27 '13 at 17:23
  • @BorutTomazin What is a "structured column"? – goat Jan 27 '13 at 19:31
  • You have structured column in you example - string following numbers. That's not the case in my problem. I have column with different formats for example "19.45c", "001", "d135-3". How can I sort that kind of "unstructured" data with your solution? – Borut Tomazin Jan 28 '13 at 08:24
  • This is brilliant. I'm going to be using this to naturally sort product SKUs. I can easily store the "padded SKU" along with the normal SKU and then have MySQL alpha sort on that. Should work great. Thanks! – Tyler V. Oct 09 '14 at 20:53
-1

You need to use the MySQL WHERE, GROUP BY, and ORDER BY clauses so you don't waste time at the PHP level parsing thousands of unneeded records.

Xeoncross
  • 55,620
  • 80
  • 262
  • 364
  • I know that but how can I apply natsort in mysql? – Borut Tomazin Jan 26 '13 at 17:12
  • The important thing is to *limit* the results to only the records you need. As long as you have the records limited you can sort them in PHP just fine, probably not as fast as a MySQL stored proc, but fast enough since you shouldn't be fetching very many rows. If you *are* fetching hundreds like you say then 1) there is a problem with the design of your application, 2) it's an admin reports page a that doesn't run often and so not a big deal, 3) you need to limit the results to only the columns you need, or 4) need to implement caching once the results are ordered. – Xeoncross Jan 26 '13 at 17:18
  • @Xeoncross Limiting results and sort later in the general case is non trivial, since to get for example the correct 50 records, you'd need to sort to know which they are. There may be application specific ways of doing it, in which case I agree completely that things should be limited in the database. – Joachim Isaksson Jan 26 '13 at 17:21
  • @Xeoncross I absolutely agree with you. But how can I know where is the limit applying sort within PHP? – Borut Tomazin Jan 26 '13 at 17:22