0

Let's say I have MySQL database records with this structure

{
            "id": 44207,
            "actors": [
                {
                    "id": "9c88bd9c-f41b-59fa-bfb6-427b1755ea64",
                    "name": "APT41",
                    "scope": "confirmed"
                },
                {
                    "id": "6f82bd9c-f31b-59fa-bf26-427b1355ea64",
                    "name": "APT67",
                    "scope": "confirmed"
                }
            ],
},
{
            "id": 44208,
            "actors": [
                {
                    "id": "427b1355ea64-bfb6-59fa-bfb6-427b1755ea64",
                    "name": "APT21",
                    "scope": "confirmed"
                },
                {
                    "id": "9c88bd9c-f31b-59fa-bf26-427b1355ea64",
                    "name": "APT22",
                    "scope": "confirmed"
                }
            ],
},
...

"actors" is a JSONField

Any way I can filter all of the objects whose actors name contains '67', for example?

Closest variant I have is that I got it working like that:

queryset.filter(actors__contains=[{"name":"APT67"}])

But this query matches by exact actor.name value, while I want to to accept 'contains' operator.

I also have it working by quering with strict array index, like this:

queryset.filter(actors__0__name__icontains='67')

But it only matches if first element in array matches my request. And I need that object shall be returned in any of his actors matches my query, so I was expecting something like queryset.filter(actors__name__icontains='67') to work, but it's not working :(

So far I have to use models.Q and multiple ORs to support my needs, like this -

search_query = models.Q(actors__0__name__icontains='67') | models.Q(actors__1__name__icontains='67') | models.Q(actors__2__name__icontains='67') | models.Q(actors__3__name__icontains='67')
queryset.filter(search_query)

but this looks horrible and supports only 4 elements lookup(or I have to include more OR's)

Any clues if thats possible to be solved normal way overall?

danronmoon
  • 3,814
  • 5
  • 34
  • 56
farefray
  • 16
  • 3
  • UPD: So, I did a raw MySQL query to query things I needed. Query is like that - ``LOWER(JSON_UNQUOTE(JSON_EXTRACT(`database`.`actors`, '$[*].\"name\"'))) LIKE LOWER('%67%')`` That's seems like doing what I need. However I have no clues how to generate such a query with Django. Closes I have is ``search_query = models.Q(actors__name__icontains='67')`` However there is a problem that it generates $.\"name\" in SQL while should be $[*].\"name\" and that's where I am stuck for now. – farefray Nov 30 '22 at 13:46

2 Answers2

0

Following this answer and the linked answer in the same post.

'contains' or 'icontains' looks for the patterns '%string%', which in your case assumes '67' is between characters. But, the number pattern is at the end of your actor name.

So, based on the answers I linked, you should probably try endswith or iendswith, in order to look for the pattern '%67'

Niko
  • 3,012
  • 2
  • 8
  • 14
  • Thank you for the help, however unfortunately this is not working. Most likely due to 'actors' being a JSON field or smt. I've tried queryset.filter(actors__name__iendswith="67") or other alternatives, as well as queryset.filter(actors__name__icontains='PT6') At database level it generates queries like this: LOWER(JSON_UNQUOTE(JSON_EXTRACT(`table`.`actors`, '$.\"name\"'))) LIKE LOWER('%67')) which are not fetching data properly. I gonna play around RAW SQL now, to figure out query which shall work in my case and then will try to construct this query in Django... – farefray Nov 30 '22 at 09:17
  • Could you explain how data is being stored? Is it a charfield / textfield containing the string representation of your JSON? I am quite confused. Would be nice if you share your models. – Niko Nov 30 '22 at 14:24
  • its a models.JSONField(blank=True, null=True) which contains array inside. Thank you for the help, I kinda have working solution which is super questionable tho xD Works for my needs for now anyway. – farefray Nov 30 '22 at 16:38
  • ```MyModel.objects.filter(data__contains=[{"name":"APT67"}])``` works for me. I am using MySQL 8.0. So does ```MyModel.objects.filter(data__0__name__icontains='67')``` which I have tested by using the lists that has APT67 in its original form and reversed in two objects instances. – Niko Nov 30 '22 at 18:00
  • yeah, those two examples are working, but what if you need to look for all objects in data and check if the name 'contains' 67. Then I couldnt find a suitable way, `MyModel.objects.filter(data__contains=[{"name":"APT67"}])` is not supporting 'contains', it will search by only full APT67 value. And `MyModel.objects.filter(data__0__name__icontains='67')` will do what we need, but only will search in data[0] index, not all of the objects in array. Something like `MyModel.objects.filter(data__*__name__icontains='67')` would work, but there's no such syntaxis from what I recall :) – farefray Dec 01 '22 at 16:50
0

My data model:

class MyCustomModel(models.Model):
   id = models.BigAutoField(primary_key=True)
   actors = models.JSONField(blank=True, null=True)

I ended up with quite hacky lookup operator which replaces '$."' into '$[*]."' in my JSON field queries, which in my case was making the correct query, filtering all the objects whos JSON field with array of objects, contains one of the needed property.

Lookup operator:

from django.db.models.lookups import IContains
from django.db.models import Field

# Custom lookup which acts like the default IContains lookup but also replaces field name to match all JSON array objects ['*'].field_name in the query instead of $.field_name.
# Maybe this could be done in a better way with better Q field path, but this works for now.
class JSONArrayContains(IContains):
    lookup_name = 'jsonarraycontains'
    
    def __init__(self, lhs, rhs):
        self.lookup_name = 'icontains' # we fake the lookup name to get the right operators further
        super().__init__(lhs, rhs)
        

    def as_sql(self, compiler, connection):
        lhs_sql, params = self.process_lhs(compiler, connection)

        # !! HERE IS THE MAGIC
        # we need to replace params parts which are like '$."name"' into parts like '$[*]."name"' if param is string and matches $." pattern
        params = [param.replace('$."', '$[*]."') if isinstance(param, str) and param.startswith('$."') else param for param in params] 

        rhs_sql, rhs_params = self.process_rhs(compiler, connection)
        params.extend(rhs_params)
        rhs_sql = self.get_rhs_op(connection, rhs_sql)
        return f'{lhs_sql} {rhs_sql}', params

Usage:

queryset.filter(actors__name__jsonarraycontains='67')

Which is filtering all the records, which

farefray
  • 16
  • 3