4

OpenAI's GPT embedding models are used across all LlamaIndex examples, even though they seem to be the most expensive and worst performing embedding models compared to T5 and sentence-transformers models (see comparison below).

How do I use all-roberta-large-v1 as embedding model, in combination with OpenAI's GPT3 as "response builder"? I'm not even sure if I can use one model for creating/retrieving embedding tokens and another model to generate the response based on the retrieved embeddings.

Example

Following is an example of what I'm looking for:

documents = SimpleDirectoryReader('data').load_data()

# Use Roberta or any other open-source model to generate embeddings
index = ???????.from_documents(documents)

# Use GPT3 here
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

print(response)

Model Comparison

Embedding Models

Source

Jay
  • 1,564
  • 16
  • 24
  • thanks for asking this question! about this: "I'm not even sure if I can use one model for creating/retrieving embedding tokens and another model to generate the response based on the retrieved embeddings." Did you understand if it is possible and if the quality of the responses is still good? – Ire00 Aug 03 '23 at 16:07

1 Answers1

0

You can set it up in a service_context, using either a local model or something from HuggingFace:

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

You can then either pass this service_context, or set it globally:

from llama_index import set_global_service_context

set_global_service_context(service_context)
Greg Funtusov
  • 1,377
  • 15
  • 18