Questions tagged [llama]

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Meta AI.

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Meta AI.

55 questions
2
votes
0 answers

Unable to clear GPU memory even after deleting variables when using Llama 2 model

i am having issues clearing out the GPU memory after loading the Llama 2 model into the pipeline. clearing out the GPU memory works fine on other models (ie del variables, torch.cuda.empty_cache()), but it seems it doesn't work when using the Llama…
malaccan
  • 61
  • 4
1
vote
1 answer

Using langchain for text to SQL using custom llm API

I am trying to use my llama2 model (exposed as an API using ollama). I want to chat with the llama agent and query my Postgres db (i.e. generate text to sql). I was able to find langchain code that uses open AI to do this. However, I am unable to…
A_K
  • 81
  • 2
  • 10
1
vote
0 answers

Save a LLM model after adding RAG pipeline and embedded model and deploy as hugging face inference?

I have created a RAG (Retrieval-augmented generation) pipeline and using it with a 4-bit quantized openllama 13b loaded directly from hugging face and without fine-tuning the model. At first I need to save the model into local. But after using…
No Flag
  • 11
  • 3
1
vote
1 answer

Sentence embeddings from LLAMA 2 Huggingface opensource

Could anyone let me know if there is any way of getting sentence embeddings from meta-llama/Llama-2-13b-chat-hf from huggingface? Model link: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf I tried using transfomer.Automodel module from…
1
vote
0 answers

Llama.generate: prefix-match hit

I am using "llama-2-7b-chat.ggmlv3.q2_K.bin" (from hugging-face) using "LlamaCpp()" in langchain. The process of "Llama.generate: prefix-match hit" repeats itself so many times and answers itself. But I want answer only once. How can I set this to…
1
vote
2 answers

Why Llama 2 7b version works but not 70b version?

I use something similar to here to run Llama 2. from os.path import dirname from transformers import LlamaForCausalLM, LlamaTokenizer import torch model = "/Llama-2-70b-chat-hf/" # model = "/Llama-2-7b-chat-hf/" tokenizer =…
user14094230
  • 278
  • 2
  • 9
1
vote
2 answers

Use LLama 2 7B with python

I would like to use llama 2 7B locally on my win 11 machine with python. I have a conda venv installed with cuda and pytorch with cuda support and python 3.10. So I am ready to go. The files a here locally downloaded from meta: folder…
lutz
  • 123
  • 1
  • 10
0
votes
0 answers

I connect llama with discord bot but doesn't work

On this code code print this error Repository Not Found for url: https://huggingface.co/api/models/llama-2-7b-chat.ggmlv3.q3_K_L.bin/revision/main. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a…
0
votes
0 answers

Running Llama 2 on Mac using HuggingFace

I am trying to run Llama 2 model from HuggingFace. Strangely these lines work fine on Colab, but give an error on Mac. Code: from transformers import AutoTokenizer import transformers import torch model =…
0
votes
0 answers

How to deploy LLama on AWS Kubernetes?

I'm stuck and getting many errors such as "waiting for Auto Scaling Group" i've tried debugging via AWS but nothing seems to work - I got advised to change the plan and deployed on llama2 7b on a g5 endpoint But you need to request the g5 virtual…
marking
  • 9
  • 1
0
votes
0 answers

Format LLama 2 Output is not parsed correctly

I've encountered difficulties in obtaining a solution to my inquiry after multiple attempts. I'm currently utilizing LLama 2 in conjunction with LangChain for the first time. The challenge I'm facing pertains to extracting the response from LLama in…
Udemytur
  • 79
  • 1
  • 5
0
votes
0 answers

Llama+LoRA: training loss straight down to 0 on full dataset (~14k) but ok on sample data (10 samples)

I am trying to fine-tune the LlaMA model with Low-Rank Adaptation (LoRA) based on HuggingFace. When I train the model on full dataset (~14k), the training loss down to 0 and keep 0 from epoch 2.train loss - full eval loss - full But the loss trend…
0
votes
0 answers

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported

I am trying to run the code from this Hugging Face blog. At first, I had no access to the model so this error: OSError: meta-llama/Llama-2-7b-chat-hf is not a local folder, is now solved and I created an acces token from Hugging Face which works.…
Quinten
  • 35,235
  • 5
  • 20
  • 53
0
votes
1 answer

OSError: meta-llama/Llama-2-7b-chat-hf is not a local folder

I'm trying to replied the code from this Hugging Face blog. At first I installed the transformers and created a token to login to hugging face hub: pip install transformers huggingface-cli login After that it is said to use use_auth_token=True when…
Quinten
  • 35,235
  • 5
  • 20
  • 53
0
votes
1 answer

Transformers - LLAMA2 13B - Key Error / Attribute Error

I'm trying to load and run the LLAMA2 13B model on my local machine, however I'm not able test any prompts due to an Key Error / Attribute Error (see image attached). My machine has the following specs: CPU: AMD® Ryzen threadripper 3960x 24-core…
DJM
  • 45
  • 4
1
2 3 4