0

Following on from this question: Django Postgresql ArrayField aggregation

I have an ArrayField of Categories and I would like to retrieve all unique values it has - however the results should be filtered so that only values starting with the supplied string are returned.

What's the "most Django" way of doing this?

Given an Animal model that looks like this:

class Animal(models.Model):
    # ...
    categories = ArrayField(
        models.CharField(max_length=255, blank=True),
        default=list,
    )
    # ...

Then, as per the other question's answer, this works for finding all categories, unfiltered.

all_categories = (
    Animal.objects
    .annotate(categories_element=Func(F('categories'), function='unnest'))
    .values_list('categories_element', flat=True)
    .distinct()
)

However, now, when I attempt to filter the result I get failure, not just with __startswith but all types of filter:

all_categories.filter(categories_element__startswith('ga'))
all_categories.filter(categories_element='dog')

Bottom of stacktrace is:

DataError: malformed array literal: "dog"
...
DETAIL:  Array value must start with "{" or dimension information.

... and it appears that it's because Django tries to do a second UNNEST - this is the SQL it generates:

...) WHERE unnest("animal"."categories") = dog::text[]

If I write the query in PSQL then it appears to require a subquery as a result of the UNNEST:

SELECT categories_element
FROM (
    SELECT UNNEST(animal.categories) as categories_element
) ul
WHERE ul.categories_element like 'Ga%';

Is there a way to get Django ORM to make a working query? Or should I just give up on the ORM and use raw SQL?

Community
  • 1
  • 1
jamesc
  • 5,852
  • 3
  • 32
  • 42
  • can maybe use https://docs.djangoproject.com/en/1.9/ref/models/querysets/#extra to get the subquery in there. `annotate` seems kind of wrong (it's usually for aggregation type functions) – Anentropic Apr 20 '16 at 17:13
  • Thanks @Anentropic - I did have a look at `extra` but couldn't see a way to add the subquery. The answer below is probably the "truest" - the schema is not fit for purpose really. I've implemented a mini query builder that selects and sorts from the ArrayFields and appears to be working nicely. – jamesc Apr 21 '16 at 10:45
  • I wondered if you could put the subquery in `tables` parameter – Anentropic Apr 21 '16 at 12:17

1 Answers1

1

You probably have the wrong database design.

Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

http://www.postgresql.org/docs/9.1/static/arrays.html

e4c5
  • 52,766
  • 11
  • 101
  • 134