Inferencing LLAMA-2 13B

Question

I have been inferencing LLAMA-13B and suddenly following error arose,

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-7820a34f7358> in <cell line: 3>()
      1 # GPU
      2 lcpp_llm = None
----> 3 lcpp_llm = Llama(
      4     model_path=model_path,
      5     # n_gqa = 8,

/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py in __init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
    321                     self.model_path.encode("utf-8"), self.params
    322                 )
--> 323         assert self.model is not None
    324 
    325         if verbose:

AssertionError:

from

# GPU
lcpp_llm = None
lcpp_llm = Llama(
    model_path=model_path,
    # n_gqa = 8,
    n_threads=2, # CPU cores,
    n_ctx = 4096,
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
)

The following is model_path:

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

and base models as:

model_name_or_path = "TheBloke/Vigogne-2-13B-Instruct-GGML"
model_basename = "vigogne-2-13b-instruct.ggmlv3.q5_1.bin" # the model is in bin format

Is anyone aware of this type of error?

Following colab notebook contains the code : https://colab.research.google.com/drive/1aRmF7r-hk9l0JTKwbwWk9LLnZOSn36Mf?usp=sharing

Using v0.1.78 for llama-cpp-python will solve the issue.

looks like there it is not recognizing the path to the model `model_path`. Very odd, as the code looks perfectly OK. You may wanna test it with a pathlib-object or by feeding the url directly (replace `hf_hub_download` with `hf_hub_url`) so that the model caches the wights directly — max, Aug 26 '23 at 07:49
oh i edited the post shortly after for anyone to see, i used the previous version of the library. i'll try yours — Malik, Aug 31 '23 at 12:47

Inferencing LLAMA-2 13B

0 Answers0