I have a usecase with me. If you have any ideas or approaches on how to solve it (partially or fully) please let me know.
I am using vector database milvus
for my application. Currently, the database contains columns content
and embeddings
. For each chunk (a chunk generally contains 2-3 lines of text), i am generating embeddings and storing them in embeddings
column.
Now, coming to the use case. I have documents that speak about onboarding, how to do discord onboarding, are stored in vector db. I will query vector db find all docs that contains slack onboarding
.
Based on the cosine similarity between the query embedding and the list of docs embedding, vector db picks the docs that has minimum distance between them.
So, i got list of docs which has onboarding and discord onboarding and a high score along with it. I am not aware of how the score is calculated. When i say high score assigned to each document listed in the resultant set, the scores are 83.5, 85.5 approx.
Since my query contains the term onboarding
, i got all onboarding docs. The query also contains slack
. Is it possible to penalise all docs from the resultant set that doesn't have slack along with onboarding with it.
If so, i wish to know how can it be done.