Highest Voted 'large-language-model' Questions

5

votes

0 answers

Starcoder finetuning - How to select the GPU and how to estimate the time it will take to finetune

I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance. It's says in the documentation that for training the model, they used 512 Tesla A100 GPUs and it took 24 days. I also saw the model…

asked Jun 01 '23 at 17:22

Aadesh

403
3
13

5

votes

2 answers

Figuring out general specs for running LLM models

I have three questions : Given count of LLM parameters in Billions, how can you figure how much GPU RAM do you need to run the model ? If you have enough CPU-RAM (i.e. no GPU) can you run the model, even if it is slow Can you run LLM models (like…

deep-learning artificial-intelligence gpt-3 large-language-model

asked May 15 '23 at 14:57

sten

7,028
9
41
63

4

votes

1 answer

Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models

What is the difference between instruction tuning and normal fine-tuning for large language models? Also the instruction-tuning I'm referring to isn't the in-context/prompt one. All the recent papers about fine-tuning seem to be about instruction…

language-model fine-tune large-language-model

asked Jun 11 '23 at 15:37

Flo

51
1
4

3

votes

4 answers

how to create a langchain doc from an str

I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this : doc=Document( …

python nlp langchain large-language-model

asked Jun 25 '23 at 15:09

Mohamed Amine

340
1
4
16

3

votes

1 answer

Finetuning a LM vs prompt-engineering an LLM

Is it possible to finetune a much smaller language model like Roberta on say, a customer service dataset and get results as good as one might get with prompting GPT-4 with parts of the dataset? Can a fine-tuned Roberta model learn to follow…

language-model roberta-language-model roberta gpt-4 large-language-model

asked Apr 18 '23 at 20:15

Tolu

1,081
1
8
23

3

votes

1 answer

Comparing methods for a QA system on a 1,000-document Markdown dataset: Indexes and embeddings with GPT-4 vs. retraining GPT4ALL (or similar)

I am working on a project to build a question-answering system for a documentation portal containing over 1,000 Markdown documents, with each document consisting of approximately 2,000-4,000 tokens. I am considering the following two options: Using…

deep-learning openai-api gpt-4 large-language-model gpt4all

asked Apr 09 '23 at 11:58

Vasil Remeniuk

20,519
6
71
81

3

votes

1 answer

How to compute sentence level perplexity from hugging face language models?

I have a large collection of documents each consisting of ~ 10 sentences. For each document, I wish to find the sentence that maximises perplexity, or equivalently the loss from a fine-tuned causal LM. I have decided to use Hugging Face and the…

python nlp huggingface-transformers large-language-model huggingface-evaluate

asked Mar 30 '23 at 09:53

pilu

720
5
16

2

votes

1 answer

Backpropagation / minibatching in training large language models (LLMs)

I am struggling to understand how backprop works for transformer-based LLMs. Here is my guess of how this process works. Given a sequence of tokens with length 64, we process the sequence in parallel using teacher forcing (i.e., for each ACTUAL…

nlp huggingface-transformers backpropagation llm large-language-model

asked Aug 17 '23 at 18:57

Chinmaya Andukuri

21
1

2

votes

0 answers

How to finetune an LLM model on your own codebase?

I have 10 code repositories in Javascript (VueJS) (Each repository corresponds to 1 Theme) I want to train an LLM model on these 10 code repositories so that I can generate new themes using prompts. The LLM model take the context of 10 code…

code-generation huggingface large-language-model

asked Jun 14 '23 at 07:54

Aadesh

403
3
13

2

votes

1 answer

How can I load scraped page content to langchain VectorstoreIndexCreator

I have a function which goes to url and crawls its content (+ from subpages). Then I want to load text content to langchain VectorstoreIndexCreator() . How can I do it via loader? I could not find any suitable loader in langchain.document_loaders.…

python openai-api langchain large-language-model

asked Jun 07 '23 at 22:02

PetrSevcik

89
1
9

2

votes

2 answers

In Langchain, why ConversationalRetrievalChain not remembering the chat history and Entering new ConversationalRetrievalChain chain for each chat?

I am trying to create an customer support system using langchain. I am using text documents as external knowledge provider via TextLoader In order to remember the chat I using ConversationalRetrievalChain with list of chats My problem is, each time…

python openai-api langchain large-language-model

asked May 16 '23 at 14:26

RagAnt

1,064
2
17
35

2

votes

1 answer

How to use cross-encoder with Huggingface transformers pipeline?

There're a set of models on huggingface hubs that comes from the sentence_transformers library, e.g. https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 The suggested usage examples are: # Using sentence_transformers from…

python nlp huggingface-transformers sentence-transformers large-language-model

asked Apr 22 '23 at 11:23

alvas

115,346
109
446
738

2

votes

1 answer

Further finetune a Peft/LoRA finetuned CausalLM Model

I am a bit unsure how to proceed regarding the mentioned topic. The baseline is a model created via Huggingface’s library as an AutoModelForCausalLM model, PEFT and a LoRA approach with subsequent merging of the weights. I now want to further fine…

huggingface-transformers lora large-language-model text-generation peft

asked Apr 20 '23 at 04:21

Julian Gerhard

86
1
4

2

votes

2 answers

Alpaca Large Language Model from Python script

I was able to install Alpaca under Linux and start and use it interactivelly via the corresponding ./chat command. However, I would like to run it not in interactive mode but from a Python (Jupyter) script with the prompt as string parameter. Also,…

python c# artificial-intelligence gpt-3 large-language-model

asked Mar 28 '23 at 11:46

DJRDS

27
5

2

votes

0 answers

Training huggingface's GPT2 from scratch : how to implement causal mask?

I am trying to train huggingface's implementation of the GPT2 model from scratch (meaning I am using their architecture but not using pre-trained weights) but I noticed by looking into the code here…

nlp huggingface-transformers gpt-2 large-language-model

asked Apr 01 '20 at 10:49

Johncowk

342
1
16

Questions tagged [large-language-model]