1

I have a usecase with me. If you have any ideas or approaches on how to solve it (partially or fully) please let me know.

I am using vector database milvus for my application. Currently, the database contains columns content and embeddings. For each chunk (a chunk generally contains 2-3 lines of text), i am generating embeddings and storing them in embeddings column.

Now, coming to the use case. I have documents that speak about onboarding, how to do discord onboarding, are stored in vector db. I will query vector db find all docs that contains slack onboarding.

Based on the cosine similarity between the query embedding and the list of docs embedding, vector db picks the docs that has minimum distance between them.

So, i got list of docs which has onboarding and discord onboarding and a high score along with it. I am not aware of how the score is calculated. When i say high score assigned to each document listed in the resultant set, the scores are 83.5, 85.5 approx.

Since my query contains the term onboarding, i got all onboarding docs. The query also contains slack. Is it possible to penalise all docs from the resultant set that doesn't have slack along with onboarding with it.

If so, i wish to know how can it be done.

James K J
  • 31
  • 2

0 Answers0