3

When I query a simple vector index created using a LlaMA index, it returns a JSON object that has the response for the query and the source nodes (with the score) it used to generate an answer. How does it calculate which nodes to use? (I'm guessing semantic search?)

Is there a way to just return the nodes back such that it doesn't use OpenAI's API (because that costs money). I was using gpt-3.5-turbo to get answers for the query.

I tried searching the LlaMA index documentation, but I couldn't find anything.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
shardgon
  • 33
  • 2

2 Answers2

1

Nodes are found by similarity searching between query and sources embeddings.

In simple words whats happening is the sentences in sources are converted to vector embeddings(a big 1D array of numbers). Then query too is converted to its embeddings. To check if the source vector is similar to query vector just taking the dot product will give a good estimate of how similar they are. So among all source embeddings which ever has highest (top_k sets how many to return so top_k=5 returns top 5) dot product is returned.

Coming to llama index part of your question. You can use the index as_retriver. It only returns sentences from your source (also dot product scores aka similarity scores and any metadata added by you).

retriver = index.as_retriver()
nodes = retriver.retrive(query)
0

You can make your index act as a retriever, and then you can query it with the response_mode = 'no_text' as stated here in this tutorial by llama_index: https://github.com/jerryjliu/llama_index/blob/3c338ea59be0bc9b4b98bce3fdc6be895409852a/docs/core_modules/query_modules/response_synthesizers/usage_pattern.md#configuring-the-response-mode

from llama_index.response_synthesizers import get_response_synthesizer

response_synthesizer = get_response_synthesizer(response_mode='no_text')

query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
response = query_engine.query("query_text")
Trajanov Risto
  • 114
  • 1
  • 6