Questions tagged [llama-cpp-python]
7 questions
1
vote
1 answer
Very slow Response from LLM based Q/A query engine
I built a Q/A query bot over a 4MB csv file I have in my local, I'm using chroma for vector DB creation and with embedding model being Instructor Large from hugging face, and LLM chat model being LlamaCPP=llama2-13b-chat, The Vector Database created…

Avish Wagde
- 33
- 4
0
votes
0 answers
Format LLama 2 Output is not parsed correctly
I've encountered difficulties in obtaining a solution to my inquiry after multiple attempts. I'm currently utilizing LLama 2 in conjunction with LangChain for the first time. The challenge I'm facing pertains to extracting the response from LLama in…

Udemytur
- 79
- 1
- 5
0
votes
2 answers
AssertionError when using llama-cpp-python in Google Colab
I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this:
!pip install llama-cpp-python
from llama_cpp import ChatCompletionMessage, Llama
model = Llama(
…

Utrax
- 3
- 2
0
votes
0 answers
Inferencing LLAMA-2 13B
I have been inferencing LLAMA-13B and suddenly following error arose,
---------------------------------------------------------------------------
AssertionError Traceback (most recent call…

Malik
- 29
- 3
0
votes
0 answers
llama-cpp-python on macbook(M1) get unexpected series of '\x1c' as output
In order to use the GPU on macbook (M1 chip), install the llama-cpp-python
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python
Download model file from https://huggingface.co/TheBloke/Trurl-2-7B-GGML/tree/main
Model name is…

geralt
- 1
- 3
0
votes
1 answer
llama-cpp-python not using NVIDIA GPU CUDA
I have been playing around with oobabooga text-generation-webui on my Ubuntu 20.04 with my NVIDIA GTX 1060 6GB for some weeks without problems. I have been using llama2-chat models sharing memory between my RAM and NVIDIA VRAM. I installed without…

imbr
- 6,226
- 4
- 53
- 65
-2
votes
0 answers
Cannot install llamacpp module provided by langchain
n_gpu_layers = 32 # Change this value based on your model and your GPU VRAM pool.
n_batch = 256 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
# Loading model,
llm = LlamaCpp(
…