Questions tagged [llamacpp]
14 questions
2
votes
1 answer
Suppress LLamaCpp stats output
How can I suppress LLamaCpp stats output in Langchain ...
equivalent code :
llm = LlamaCpp(model_path=..., ....)
llm('who is Caesar')
> who is Caesar ?
Julius Caesar was a Roman general and statesman who played a critical role in the events that…

sten
- 7,028
- 9
- 41
- 63
1
vote
0 answers
Deploy app with llama-cpp-python dependency on Vercel
Cant deploy to vercel my app that requires llama-cpp-python (sorry if a newbie question):
(venv) bacelar@bnr:~/www/2023/python/$ vercel --force
Vercel CLI 30.2.3
Inspect: https://vercel.com/ [1s]
Error: Command failed: pip3.9…

cbacelar
- 545
- 7
- 20
0
votes
2 answers
AssertionError when using llama-cpp-python in Google Colab
I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this:
!pip install llama-cpp-python
from llama_cpp import ChatCompletionMessage, Llama
model = Llama(
…

Utrax
- 3
- 2
0
votes
0 answers
Inferencing LLAMA-2 13B
I have been inferencing LLAMA-13B and suddenly following error arose,
---------------------------------------------------------------------------
AssertionError Traceback (most recent call…

Malik
- 29
- 3
0
votes
1 answer
llama-cpp-python not using NVIDIA GPU CUDA
I have been playing around with oobabooga text-generation-webui on my Ubuntu 20.04 with my NVIDIA GTX 1060 6GB for some weeks without problems. I have been using llama2-chat models sharing memory between my RAM and NVIDIA VRAM. I installed without…

imbr
- 6,226
- 4
- 53
- 65
0
votes
0 answers
error loading model: MapViewOfFile failed: Not enough memory resources are available to process this command
PC specs: ryzen 5700x,32gb ram, 100gb free space sdd, rtx 3060 12gb vram
I'm trying to run locally llama-7b-chat model. Followed every instruction step, first converted the model to ggml FP16 format
python convert.py .\models\llama-2-7b-chat\…

Dolir Dollar
- 1
- 1
0
votes
0 answers
LLAMA CPP Python binding raises SIGILL Error on Pycharm and terminal while the same code works perfectly fine on Anaconda Jupyter Notebook
LLAMACPP Pycharm
I am trying to run LLAMA2 Quantised models on my MAC referring to the link above.
When I run the below code on Jupyter notebook, it works fine and gives expected output. However, it gives a sigerror while running on Pycharm. Could…

Jason
- 676
- 1
- 12
- 34
0
votes
0 answers
CMAKE LLAMA CPP Binding PIP Installation giving error
Trying to install llama-cpp-python as depicted on MAC using METAL. However, it is giving the following error as depicted in the screenshot. Could someone help?
https://python.langchain.com/docs/integrations/llms/llamacpp

Jason
- 676
- 1
- 12
- 34
0
votes
0 answers
Could not load Llama model from path: ./Models/llama-7b.ggmlv3.q2_K.bin. Received error Llama.__init__() got an unexpected keyword argument 'input'
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
template = """Question:…

rahularyansharma
- 11,156
- 18
- 79
- 135
0
votes
0 answers
How to use decapoda-research / llama-7b-hf with fine tuning LoRA in LLaMA.cpp?
Currently after fine tune model decapoda-research / llama-7b-hf with tool https://github.com/zetavg/LLaMA-LoRA-Tuner.
Now I try to use it in LLaMA.cpp with tutorial: https://github.com/ggerganov/llama.cpp/discussions/1166
As far as I know, I need…

Khoi V
- 612
- 8
- 13
0
votes
0 answers
How fix 'type=value_error' when loading a wizard-vicuna model to PrivateGPT?
I'm following a tutorial to install PrivateGPT and be able to query with a LLM about my local documents.
I'm using a wizard-vicuna-13B.ggmlv3.q4_1.bin model, and as per the README.md adjusted the example.env file settings to a new .env…

pol0
- 33
- 5
-1
votes
0 answers
How can I use the GPUs more effectively on an AWS g5dn.metal instance running llama-2?
With the release of llama-2 I wanted to try this myself. To see how fast I can make this, I fired up (from time to time, briefly) a g5dn.metal instance with 96 CPU cores and 8 GPU cards. I used these two articles to…

Gunther Schadow
- 1,490
- 13
- 22
-2
votes
0 answers
Cannot install llamacpp module provided by langchain
n_gpu_layers = 32 # Change this value based on your model and your GPU VRAM pool.
n_batch = 256 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
# Loading model,
llm = LlamaCpp(
…
-2
votes
0 answers
Will Inconsistent Alternation of Responses Affect Fine-Tuning LLAMA2 with Chat History
I am working on fine-tuning LLAMA2 with a dataset containing chat history. While preparing the data, I've noticed that the dialogue doesn't always follow a pattern of alternating responses between speakers. In some cases, one person responds several…

Ivo Oostwegel
- 374
- 2
- 20