Use LLama 2 7B with python

Question

I would like to use llama 2 7B locally on my win 11 machine with python. I have a conda venv installed with cuda and pytorch with cuda support and python 3.10. So I am ready to go.

The files a here locally downloaded from meta: folder llama-2-7b-chat with:

checklist.chk
consolidated.00.pth
params.json

Now I would like to interact with the model. But I only find code snippets downloading the model from huggingface, which is not needed in my case.

Can someone provide me with a few lines of code to interact with the model via Python?

I found some additional info at this repository: https://github.com/facebookresearch/llama I added the "tokenizer.model" and installed the additional dependencies. But I get several errors regarding NCCL, Kubernetes etc., so I guess that is not meant for my use case — lutz, Aug 05 '23 at 14:50
Read the readme of that repo again, you shall find [llama-recipes](https://github.com/facebookresearch/llama-recipes/) (under the title, 3rd paragraph) which is the code example. — dinhanhx, Aug 05 '23 at 15:20

score 1 · Answer 1 · answered Aug 24 '23 at 08:31

I know you mentioned huggingface is unnecessary in your case but to download and use the model, it's much easier to use their transformers.

After you download the weights - you need to re-structure the folder as follows:(notice I moved 3 of the files under 7B)

├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── config.json
├── generation_config.json
├── LICENSE
├── tokenizer_checklist.chk
├── tokenizer.model
└── USE_POLICY.md

Next download the conversion script from here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

And finally run this script:

python convert_llama_weights_to_hf.py --input_dir llama-2-7b/ --model_size 7B --output_dir model

Once it's finished - you can import the model as follows:

from transformers import LlamaForCausalLM, LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("./model")
model = LlamaForCausalLM.from_pretrained("./model")

You can then learn more on how to prompt the model here: https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/llama2#transformers.LlamaForCausalLM.forward.example

score 0 · Answer 2 · answered Aug 05 '23 at 20:21

The downloaded files are not all needed. I got it to work using cuda gpu on win 11, but with a slightly other way:

First of all, I used this repo and not the code provided by Meta itsself (but I had to download the files via huggingface): https://github.com/oobabooga/text-generation-webui
The cuda installation via conda did have some errors, even when everything looked fine at first. I could solve this by installing the stack like it was provided here: https://github.com/jeffheaton/t81_558_deep_learning/blob/master/install/manual_setup2.ipynb

I hope that helps others as well ...

Use LLama 2 7B with python

2 Answers2