0

I have the following models:

class Member(models.Model):
    ref = models.CharField(max_length=200)
    # some other stuff
    def __str__(self):
        return self.ref

class Feature(models.Model):
    feature_id = models.BigIntegerField(default=0)
    members = models.ManyToManyField(Member)
    # some other stuff

A Member is basically just a pointer to a Feature. So let's say I have Features:

  • feature_id = 2, members = 1, 2
  • feature_id = 4
  • feature_id = 3

Then the members would be:

  • id = 1, ref = 4
  • id = 2, ref = 3

I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:

# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp

This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:

print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())

... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:

  1. Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.

  2. Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches

  3. Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?

Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.

EDIT - I think a much faster way to get ndids is like this:

ndids = ndtmp.values_list('feature_id', flat=True)

This works, producing a non-empty set of ids. Unfortunately, I am still at a loss as to how to get okmems. I tried:

okmems = Member.objects.filter(ref__in=str(ndids))

But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:

Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
Max von Hippel
  • 2,856
  • 3
  • 29
  • 46

2 Answers2

0

You should take a look at annotate:

okmems = Member.objects.annotate(
    feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
  • I think you're onto something, but the "models.Count('feature')" part doesn't make sense to me. A Feature has a many-to-many "list" full of Members, not the other way around. `okmems` should contain members which point to Features in my accepted features list (which I successfully make). – Max von Hippel Nov 14 '16 at 09:17
  • I'm sorry if I'm being unclear, I'd like to abstract as much as possible but obviously this is a bit of a gnarly, particular use case. Basically, I already retrieved all of the Features which do not contain any Members. Let's call that "Set A". Now I want to get all the Features whose Members at the very least overlap with "Set A". The operative issue appears to be the Member "ref" to Feature "feature_id" comparison, but I'm not sure. – Max von Hippel Nov 14 '16 at 09:19
  • I think the root of the question is how to compare a CharField id to a QuerySet of BigInt ids. I'm going to try going down [this path](http://stackoverflow.com/a/2513473/1586231) and see where it leads. – Max von Hippel Nov 14 '16 at 09:59
  • Sorry, I somehow skipped the CharField part for some reason. Frankly, this seems like a bad database design to me. I have encountered a schema like this, and it was pure pain to maintain. On your issue - you might want to check id ordering in both `Member.ref` and your queries - `'1, 2, 3'` and `'2, 1, 3'` are different strings even though ids are the same. – Назар Топольський Nov 14 '16 at 10:45
  • Very true, on both fronts. I'll probably end up investing in figuring out how to migrate to BigInt on both tomorrow. – Max von Hippel Nov 14 '16 at 11:48
0

Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:

# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
    feature_id AS VARCHAR)"}).order_by( \
    '-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!

Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

Max von Hippel
  • 2,856
  • 3
  • 29
  • 46
  • I figured this out in large part using [this answer](http://stackoverflow.com/a/28101648/1586231) and [this question, not the answers to it though haha](http://stackoverflow.com/questions/32068949/django-queryset-cast-varchar-field-in-sql-before-filtering). – Max von Hippel Nov 15 '16 at 05:27