29

I'm in a situation where I must output a quite large list of objects by a CharField used to store street addresses.

My problem is, that obviously the data is ordered by ASCII codes since it's a Charfield, with the predictable results .. it sort the numbers like this;

1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21....

Now the obvious step would be to change the Charfield the proper field type (IntegerField let's say), however it cannot work since some address might have apartments .. like "128A".

I really don't know how I can order this properly ..

h3.
  • 10,688
  • 15
  • 51
  • 54

10 Answers10

28

If you're sure there are only integers in the field, you could get the database to cast it as an integer via the extra method, and order by that:

MyModel.objects.extra(
    select={'myinteger': 'CAST(mycharfield AS INTEGER)'}
).order_by('myinteger')
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 1
    Not all addresses start with a number. This approach works for special-case of "I have numbers in my char field", but will fail to sort mixed data. – Dave W. Smith Jun 04 '10 at 00:29
  • Very interesting use of extra, I rarely play with that method.. But it doesn't seem to work in my situation unfortunately. – h3. Jun 04 '10 at 12:31
  • 3
    Use 'SIGNED' or 'UNSIGNED' instead of INTEGER if your MYSQL version doesn't support it. – Adriaan Tijsseling Jul 06 '12 at 11:35
  • 5
    Since django is deprecating `extra()`, here's how to do the same thing using `annotate()`: `MyModel.objects.annotate(myinteger=RawSQL('CAST(mycharfield AS UNSIGNED)', params=[])).order_by('myinteger')` – coredumperror Mar 21 '18 at 00:03
23

Django is trying to deprecate the extra() method, but has introduced Cast() in v1.10. In sqlite (at least), CAST can take a value such as 10a and will cast it to the integer 10, so you can do:

from django.db.models import IntegerField
from django.db.models.functions import Cast

MyModel.objects.annotate(
    my_integer_field=Cast('my_char_field', IntegerField())
).order_by('my_integer_field', 'my_char_field')

which will return objects sorted by the street number first numerically, then alphabetically, e.g. ...14, 15a, 15b, 16, 16a, 17...

phoenix
  • 7,988
  • 6
  • 39
  • 45
practual
  • 2,363
  • 2
  • 11
  • 12
  • 2
    `Cast` doesn't support values such as `10a`. If it receives one, it throws `DataError: invalid input syntax for integer`. I found a workaround by removing all characters in the value that are not numbers (PostgreSQL): `from django.db.models.expressions import F, Value, Func` `queryset.annotate(my_integer_field=Cast( Func(F('my_char_field'), Value('[^\d]'), Value(''), Value('g'), function='regexp_replace'), IntegerField()) )` – Aylen Mar 16 '17 at 15:38
  • 1
    @Filly can you give the full example that threw the error? – practual Mar 22 '17 at 22:59
  • @practual your solution works on Sqlite3 but gives error on PostgreSQL like in Filly ' comment. – ishak O. Jul 07 '20 at 09:55
  • when all the values are numeric, it works fine, but when there is a string value on results: invalid input syntax for type integer: "x13079" – cem Jan 27 '22 at 20:22
18

If you're using PostgreSQL (not sure about MySQL) you can safely use following code on char/text fields and avoid cast errors:

MyModel.objects.extra(
    select={'myinteger': "CAST(substring(charfield FROM '^[0-9]+') AS INTEGER)"}
).order_by('myinteger')
darklow
  • 2,249
  • 24
  • 22
  • This is the best if you have to sort by strings that have numbers, with a pattern '.[0-9]+' – Ajoy Feb 09 '15 at 09:00
  • The syntax for MariaDB would be: "CAST(REGEXP_SUBSTR(name, '^[0-9]+') AS INTEGER)". Thanks!! – Igor Sobreira Apr 14 '15 at 01:15
  • Awesome answer, how can I do this when I am looking up a foreign key for that field? – Alex Stewart Sep 01 '17 at 02:29
  • This does not work for me, the error that I get is: `ProgrammingError: syntax error at or near "[" LINE 1: ...ECT DISTINCT ('SELECT CAST(substring(value FROM '^[0-9]+') A...`. I use PostgreSQL 11, Python 2.7 and Django 1.11. I think it may be because of old versions I have. – andramos Mar 12 '21 at 20:17
4

I know that I’m late on this, but since it’s strongly related to the question, and that I had a hard time finding this:

You have to know that you can directly put the Cast in the ordering option of your model.

from django.db import models
from django.db.models.functions import Cast


class Address(models.Model):

    street_number = models.CharField()

    class Meta:
        ordering = [
            Cast("street_number", output_field=models.IntegerField()),
        ]

From the doc about ordering:

You can also use query expressions.

And from the doc about database functions:

Functions are also expressions, so they can be used and combined with other expressions like aggregate functions. 

DevOps Craftsman
  • 355
  • 3
  • 16
3

Great tip! It works for me! :) That's my code:

revisioned_objects = revisioned_objects.extra(select={'casted_object_id': 'CAST(object_id AS INTEGER)'}).extra(order_by = ['casted_object_id'])
diegor
  • 75
  • 1
  • 8
2

The problem you're up against is quite similar to how filenames get ordered when sorting by filename. There, you want "2 Foo.mp3" to appear before "12 Foo.mp3".

A common approach is to "normalize" numbers to expanding to a fixed number of digits, and then sorting based on the normalized form. That is, for purposes of sorting, "2 Foo.mp3" might expand to "0000000002 Foo.mp3".

Django won't help you here directly. You can either add a field to store the "normalized" address, and have the database order_by that, or you can do a custom sort in your view (or in a helper that your view uses) on address records before handing the list of records to a template.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46
2

In my case i have a CharField with a name field, which has mixed (int+string) values, for example. "a1", "f65", "P", "55" e.t.c ..

Solved the issue by using the sql cast (tested with postgres & mysql), first, I try to sort by the casted integer value, and then by the original value of the name field.

parking_slots = ParkingSlot.objects.all().extra(
        select={'num_from_name': 'CAST(name AS INTEGER)'}
    ).order_by('num_from_name', 'name')

This way, in any case, the correct sorting works for me.

Webdma
  • 714
  • 6
  • 16
1

In case you need to sort version numbers consisting of multiple numbers separated by a dot (e.g. 1.9.0, 1.10.0), here is a postgres-only solution:

class VersionRecordManager(models.Manager):

    def get_queryset(self):
        return super().get_queryset().extra(
            select={
                'natural_version': "string_to_array(version, '.')::int[]",
            },
        )

    def available_versions(self):
        return self.filter(available=True).order_by('-natural_version')

    def last_stable(self):
        return self.available_versions().filter(stable=True).first()

class VersionRecord(models.Model):
    objects = VersionRecordManager()
    version = models.CharField(max_length=64, db_index=True)
    available = models.BooleanField(default=False, db_index=True)
    stable = models.BooleanField(default=False, db_index=True)

In case you want to allow non-numeric characters (e.g. 0.9.0 beta, 2.0.0 stable):

def get_queryset(self):
    return super().get_queryset().extra(
        select={
            'natural_version':
                "string_to_array(                     "  
                "   regexp_replace(                   "  # Remove everything except digits
                "       version, '[^\d\.]+', '', 'g'  "  # and dots, then split string into
                "   ), '.'                            "  # an array of integers.
                ")::int[]                             "
        }
    )
Max Malysh
  • 29,384
  • 19
  • 111
  • 115
1

I was looking for a way to sort the numeric chars in a CharField and my search led me here. The name fields in my objects are CC Licenses, e.g., 'CC BY-NC 4.0'.

Since extra() is going to be deprecated, I was able to do it this way:

MyObject.objects.all()
    .annotate(sorting_int=Cast(Func(F('name'), Value('\D'), Value(''), Value('g'), function='regexp_replace'), IntegerField()))
    .order_by('-sorting_int')

Thus, MyObject with name='CC BY-NC 4.0' now has sorting_int=40.

Rob L
  • 3,634
  • 2
  • 19
  • 38
0

All the answeres in this thread did not work for me because they are assuming numerical text. I found a solution that will work for a subset of cases. Consider this model

Class Block(models.Model):
      title = models.CharField()

say I have fields that sometimes have leading characters and trailing numerical characters If i try and order normally

 >>> Block.objects.all().order_by('title')
<QuerySet [<Block: 1>, <Block: 10>, <Block: 15>, <Block: 2>, <Block: N1>, <Block: N12>, <Block: N4>]>

As expected, it's correct alphabetically, but makes no sense for us humans. The trick that I did for this particular use case is to replace any text i find with the number 9999 and then cast the value to an integer and order by it.

for most cases that have leading characters this will get the desired result. see below

from django.db.models.expressions import RawSQL

>>> Block.objects.all()\
.annotate(my_faux_integer=RawSQL("CAST(regexp_replace(title, '[A-Z]+', '9999', 'g') AS INTEGER)", ''))\    
.order_by('my_faux_integer', 'title')
    
<QuerySet [<Block: 1>, <Block: 2>, <Block: 10>, <Block: 15>, <Block: N1>, <Block: N4>, <Block: N12>]>
Dr Manhattan
  • 13,537
  • 6
  • 45
  • 41