I have been inferencing LLAMA-13B and suddenly following error arose,
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-5-7820a34f7358> in <cell line: 3>()
1 # GPU
2 lcpp_llm = None
----> 3 lcpp_llm = Llama(
4 model_path=model_path,
5 # n_gqa = 8,
/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py in __init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
321 self.model_path.encode("utf-8"), self.params
322 )
--> 323 assert self.model is not None
324
325 if verbose:
AssertionError:
from
# GPU
lcpp_llm = None
lcpp_llm = Llama(
model_path=model_path,
# n_gqa = 8,
n_threads=2, # CPU cores,
n_ctx = 4096,
n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
)
The following is model_path:
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
and base models as:
model_name_or_path = "TheBloke/Vigogne-2-13B-Instruct-GGML"
model_basename = "vigogne-2-13b-instruct.ggmlv3.q5_1.bin" # the model is in bin format
Is anyone aware of this type of error?
Following colab notebook contains the code : https://colab.research.google.com/drive/1aRmF7r-hk9l0JTKwbwWk9LLnZOSn36Mf?usp=sharing
Using v0.1.78 for llama-cpp-python will solve the issue.