Use this tag for questions about large language models (LLM), trained deep-learning artificial intelligence algorithms that interpret and generate natural language text.
Questions tagged [large-language-model]
118 questions
5
votes
0 answers
Starcoder finetuning - How to select the GPU and how to estimate the time it will take to finetune
I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance.
It's says in the documentation that for training the model, they used 512 Tesla A100 GPUs and it took 24 days.
I also saw the model…

Aadesh
- 403
- 3
- 13
5
votes
2 answers
Figuring out general specs for running LLM models
I have three questions :
Given count of LLM parameters in Billions, how can you figure how much GPU RAM do you need to run the model ?
If you have enough CPU-RAM (i.e. no GPU) can you run the model, even if it is slow
Can you run LLM models (like…

sten
- 7,028
- 9
- 41
- 63
4
votes
1 answer
Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models
What is the difference between instruction tuning and normal fine-tuning for large language models?
Also the instruction-tuning I'm referring to isn't the in-context/prompt one.
All the recent papers about fine-tuning seem to be about instruction…

Flo
- 51
- 1
- 4
3
votes
4 answers
how to create a langchain doc from an str
I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this :
doc=Document(
…

Mohamed Amine
- 340
- 1
- 4
- 16
3
votes
1 answer
Finetuning a LM vs prompt-engineering an LLM
Is it possible to finetune a much smaller language model like Roberta on say, a customer service dataset and get results as good as one might get with prompting GPT-4 with parts of the dataset?
Can a fine-tuned Roberta model learn to follow…

Tolu
- 1,081
- 1
- 8
- 23
3
votes
1 answer
Comparing methods for a QA system on a 1,000-document Markdown dataset: Indexes and embeddings with GPT-4 vs. retraining GPT4ALL (or similar)
I am working on a project to build a question-answering system for a documentation portal containing over 1,000 Markdown documents, with each document consisting of approximately 2,000-4,000 tokens.
I am considering the following two options:
Using…

Vasil Remeniuk
- 20,519
- 6
- 71
- 81
3
votes
1 answer
How to compute sentence level perplexity from hugging face language models?
I have a large collection of documents each consisting of ~ 10 sentences. For each document, I wish to find the sentence that maximises perplexity, or equivalently the loss from a fine-tuned causal LM. I have decided to use Hugging Face and the…

pilu
- 720
- 5
- 16
2
votes
1 answer
Backpropagation / minibatching in training large language models (LLMs)
I am struggling to understand how backprop works for transformer-based LLMs.
Here is my guess of how this process works. Given a sequence of tokens with length 64, we process the sequence in parallel using teacher forcing (i.e., for each ACTUAL…

Chinmaya Andukuri
- 21
- 1
2
votes
0 answers
How to finetune an LLM model on your own codebase?
I have 10 code repositories in Javascript (VueJS) (Each repository corresponds to 1 Theme)
I want to train an LLM model on these 10 code repositories so that I can generate new themes using prompts.
The LLM model take the context of 10 code…

Aadesh
- 403
- 3
- 13
2
votes
1 answer
How can I load scraped page content to langchain VectorstoreIndexCreator
I have a function which goes to url and crawls its content (+ from subpages). Then I want to load text content to langchain VectorstoreIndexCreator() . How can I do it via loader? I could not find any suitable loader in langchain.document_loaders.…

PetrSevcik
- 89
- 1
- 9
2
votes
2 answers
In Langchain, why ConversationalRetrievalChain not remembering the chat history and Entering new ConversationalRetrievalChain chain for each chat?
I am trying to create an customer support system using langchain. I am using text documents as external knowledge provider via TextLoader
In order to remember the chat I using ConversationalRetrievalChain with list of chats
My problem is, each time…

RagAnt
- 1,064
- 2
- 17
- 35
2
votes
1 answer
How to use cross-encoder with Huggingface transformers pipeline?
There're a set of models on huggingface hubs that comes from the sentence_transformers library, e.g. https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
The suggested usage examples are:
# Using sentence_transformers
from…

alvas
- 115,346
- 109
- 446
- 738
2
votes
1 answer
Further finetune a Peft/LoRA finetuned CausalLM Model
I am a bit unsure how to proceed regarding the mentioned topic.
The baseline is a model created via Huggingface’s library as an AutoModelForCausalLM model, PEFT and a LoRA approach with subsequent merging of the weights.
I now want to further fine…

Julian Gerhard
- 86
- 1
- 4
2
votes
2 answers
Alpaca Large Language Model from Python script
I was able to install Alpaca under Linux and start and use it interactivelly via the corresponding ./chat command.
However, I would like to run it not in interactive mode but from a Python (Jupyter) script with the prompt as string parameter. Also,…

DJRDS
- 27
- 5
2
votes
0 answers
Training huggingface's GPT2 from scratch : how to implement causal mask?
I am trying to train huggingface's implementation of the GPT2 model from scratch (meaning I am using their architecture but not using pre-trained weights) but I noticed by looking into the code here…

Johncowk
- 342
- 1
- 16