1

I am using llama_index with custom LLM. LLM I have used is open assistant Pythia model.

My code :

import os
from llama_index import (
    GPTKeywordTableIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext,
    PromptHelper
)
from langchain import OpenAI

import torch
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex
from llama_index import LLMPredictor, ServiceContext
from transformers import pipeline
from typing import Optional, List, Mapping, Any

from transformers import AutoModelForCausalLM, AutoTokenizer



# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


class CustomLLM(LLM):
    model_name="OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5"
    tokenizer = AutoTokenizer.from_pretrained("OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", padding_side="left")
    model = AutoModelForCausalLM.from_pretrained("OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", 
                                             load_in_8bit=True,
                                             device_map="auto")
    #pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})
    pipeline = pipeline(
        "text-generation",
        model=model, 
        tokenizer=tokenizer, 
        max_length=512,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.15
    )

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        prompt_length = len(prompt)
        response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

        # only return newly generated tokens
        return response[prompt_length:]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

    
os.environ['OPENAI_API_KEY'] = 'demo'
documents = SimpleDirectoryReader('data').load_data()


# define LLM
llm_predictor = LLMPredictor(llm=CustomLLM())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# build index
index = GPTKeywordTableIndex.from_documents(documents, service_context=service_context)

# get response from query
query_engine = index.as_query_engine()
response = query_engine.query("What is capital of france?");

print(response)

Now I have a data directory with one file named "france.txt". In this file, I have written "The captial of France is XYZ".

But still above code is answering Paris. How can i avoid out of context answering. Basically I want it to answer only based on my input files(which is france.txt) in this case

Ankit Bansal
  • 2,162
  • 8
  • 42
  • 79

1 Answers1

0

I believe you have to specify this in the prompt explicitly (or in the prompt template). If going the template route, you can create a custom prompt (follow tutorials on llama index docs) where you can specify you want the model to only use the context provided and not prior knowledge. Hope this helps.

Ps, some tips:

  1. You need to pass in your prompt_helper to service_context or it’s not used.

  2. No need to include information about OpenAI as you’re not using their API.

  3. In your call overload definition, you can use pipeline.run(prompt) to only get back generated text.

Good luck, I’m playing the same game as you rn and it can get complicated very quickly with large context databases.

Richard
  • 33
  • 4