Questions tagged [llamacpp]

14 questions
2
votes
1 answer

Suppress LLamaCpp stats output

How can I suppress LLamaCpp stats output in Langchain ... equivalent code : llm = LlamaCpp(model_path=..., ....) llm('who is Caesar') > who is Caesar ? Julius Caesar was a Roman general and statesman who played a critical role in the events that…
sten
  • 7,028
  • 9
  • 41
  • 63
1
vote
0 answers

Deploy app with llama-cpp-python dependency on Vercel

Cant deploy to vercel my app that requires llama-cpp-python (sorry if a newbie question): (venv) bacelar@bnr:~/www/2023/python/$ vercel --force Vercel CLI 30.2.3 Inspect: https://vercel.com/ [1s] Error: Command failed: pip3.9…
cbacelar
  • 545
  • 7
  • 20
0
votes
2 answers

AssertionError when using llama-cpp-python in Google Colab

I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this: !pip install llama-cpp-python from llama_cpp import ChatCompletionMessage, Llama model = Llama( …
0
votes
0 answers

Inferencing LLAMA-2 13B

I have been inferencing LLAMA-13B and suddenly following error arose, --------------------------------------------------------------------------- AssertionError Traceback (most recent call…
Malik
  • 29
  • 3
0
votes
1 answer

llama-cpp-python not using NVIDIA GPU CUDA

I have been playing around with oobabooga text-generation-webui on my Ubuntu 20.04 with my NVIDIA GTX 1060 6GB for some weeks without problems. I have been using llama2-chat models sharing memory between my RAM and NVIDIA VRAM. I installed without…
imbr
  • 6,226
  • 4
  • 53
  • 65
0
votes
0 answers

error loading model: MapViewOfFile failed: Not enough memory resources are available to process this command

PC specs: ryzen 5700x,32gb ram, 100gb free space sdd, rtx 3060 12gb vram I'm trying to run locally llama-7b-chat model. Followed every instruction step, first converted the model to ggml FP16 format python convert.py .\models\llama-2-7b-chat\…
0
votes
0 answers

LLAMA CPP Python binding raises SIGILL Error on Pycharm and terminal while the same code works perfectly fine on Anaconda Jupyter Notebook

LLAMACPP Pycharm I am trying to run LLAMA2 Quantised models on my MAC referring to the link above. When I run the below code on Jupyter notebook, it works fine and gives expected output. However, it gives a sigerror while running on Pycharm. Could…
Jason
  • 676
  • 1
  • 12
  • 34
0
votes
0 answers

CMAKE LLAMA CPP Binding PIP Installation giving error

Trying to install llama-cpp-python as depicted on MAC using METAL. However, it is giving the following error as depicted in the screenshot. Could someone help? https://python.langchain.com/docs/integrations/llms/llamacpp
Jason
  • 676
  • 1
  • 12
  • 34
0
votes
0 answers

Could not load Llama model from path: ./Models/llama-7b.ggmlv3.q2_K.bin. Received error Llama.__init__() got an unexpected keyword argument 'input'

from langchain.llms import LlamaCpp from langchain import PromptTemplate, LLMChain from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler template = """Question:…
rahularyansharma
  • 11,156
  • 18
  • 79
  • 135
0
votes
0 answers

How to use decapoda-research / llama-7b-hf with fine tuning LoRA in LLaMA.cpp?

Currently after fine tune model decapoda-research / llama-7b-hf with tool https://github.com/zetavg/LLaMA-LoRA-Tuner. Now I try to use it in LLaMA.cpp with tutorial: https://github.com/ggerganov/llama.cpp/discussions/1166 As far as I know, I need…
Khoi V
  • 612
  • 8
  • 13
0
votes
0 answers

How fix 'type=value_error' when loading a wizard-vicuna model to PrivateGPT?

I'm following a tutorial to install PrivateGPT and be able to query with a LLM about my local documents. I'm using a wizard-vicuna-13B.ggmlv3.q4_1.bin model, and as per the README.md adjusted the example.env file settings to a new .env…
-1
votes
0 answers

How can I use the GPUs more effectively on an AWS g5dn.metal instance running llama-2?

With the release of llama-2 I wanted to try this myself. To see how fast I can make this, I fired up (from time to time, briefly) a g5dn.metal instance with 96 CPU cores and 8 GPU cards. I used these two articles to…
Gunther Schadow
  • 1,490
  • 13
  • 22
-2
votes
0 answers

Cannot install llamacpp module provided by langchain

n_gpu_layers = 32 # Change this value based on your model and your GPU VRAM pool. n_batch = 256 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU. # Loading model, llm = LlamaCpp( …
-2
votes
0 answers

Will Inconsistent Alternation of Responses Affect Fine-Tuning LLAMA2 with Chat History

I am working on fine-tuning LLAMA2 with a dataset containing chat history. While preparing the data, I've noticed that the dialogue doesn't always follow a pattern of alternating responses between speakers. In some cases, one person responds several…
Ivo Oostwegel
  • 374
  • 2
  • 20