class Item(models.Model):
vector_repr = models.TextField(..., verbose_name='jsonified vector representation')
...
# My current solution:
def as_vector(item): return np.asarray(json.loads(item.vector_repr))
item = Item.objects.get(...)
item_vect = as_vector(item)
def cosine_similarity(other): return np.dot(item_vect, as_vector(other))
db_items = Item.objects.exclude(id=item.id)
similar_items = sorted(db_items, key=cosine_similarity)
Basically i want to sort all the Items in a mysql database applying the cosine similarity with a given item.
The problem is that the vector that represents all the items (vector_repr) is very large, and the items in the database are a lot, so this method is really slow (~2min).
How can i speed up this process? (Possibly without storing in my db the similarity of each pair of items)