Highest Voted 'huggingface-tokenizers' Questions

41

votes

5 answers

How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

I use pytorch to train huggingface-transformers model, but every epoch, always output the warning: The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this warning, please explicitly set…

asked Jul 02 '20 at 07:35

snowzjy

521
1
4
5

38

votes

5 answers

ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error

def split_data(path): df = pd.read_csv(path) return train_test_split(df , test_size=0.1, random_state=100) train, test = split_data(DATA_DIR) train_texts, train_labels = train['text'].to_list(), train['sentiment'].to_list() test_texts,…

tokenize bert-language-model huggingface-transformers huggingface-tokenizers distilbert

asked Aug 21 '20 at 05:59

Raoof Naushad

526
1
5
7

31

votes

4 answers

Transformers v4.x: Convert slow tokenizer to fast tokenizer

I'm following the transformer's pretrained model xlm-roberta-large-xnli example from transformers import pipeline classifier = pipeline("zero-shot-classification", model="joeddav/xlm-roberta-large-xnli") and I get the…

python nlp huggingface-transformers huggingface-tokenizers

asked Dec 23 '20 at 22:44

Miguel Trejo

5,913
5
24
49

25

votes

3 answers

Huggingface saving tokenizer

I am trying to save the tokenizer in huggingface so that I can load it later from a container where I don't need access to the internet. BASE_MODEL = "distilbert-base-multilingual-cased" tokenizer =…

huggingface-transformers huggingface-tokenizers

asked Oct 27 '20 at 08:20

sachinruk

9,571
12
55
86

22

votes

2 answers

Suppress HuggingFace logging warning: "Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation."

In HuggingFace, every time I call a pipeline() object, I get a warning: `"Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation." How do I suppress this warning without suppressing all logging warnings? I want other…

huggingface-transformers huggingface-tokenizers

asked Oct 17 '21 at 23:40

Rylan Schaeffer

1,945
2
28
50

22

votes

1 answer

How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?

I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences as: 'My name is slim shade and I am an aspiring…

python deep-learning pytorch bert-language-model huggingface-tokenizers

asked Dec 11 '20 at 06:26

Deshwal

3,436
4
35
94

20

votes

6 answers

Huggingface AlBert tokenizer NoneType error with Colab

I simply tried the sample code from hugging face website: https://huggingface.co/albert-base-v2 from transformers import AlbertTokenizer, AlbertModel tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2') text = "Replace me by any text you'd…

google-colaboratory huggingface-transformers huggingface-tokenizers

asked Jan 23 '21 at 01:00

MeiNan Zhu

1,021
1
9
18

16

votes

3 answers

How to truncate input in the Huggingface pipeline?

I currently use a huggingface pipeline for sentiment-analysis like so: from transformers import pipeline classifier = pipeline('sentiment-analysis', device=0) The problem is that when I pass texts larger than 512 tokens, it just crashes saying that…

huggingface-transformers huggingface-tokenizers

asked Jun 05 '21 at 12:56

EtienneT

5,045
6
36
39

15

votes

2 answers

BertModel transformers outputs string instead of tensor

I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a string instead of the hidden state. This is the…

bert-language-model huggingface-transformers huggingface-tokenizers

asked Dec 03 '20 at 18:42

Miguel

2,738
3
35
51

15

votes

2 answers

How to encode multiple sentences using transformers.BertTokenizer?

I would like to create a minibatch by encoding multiple sentences using transform.BertTokenizer. It seems working for a single sentence. How to make it work for several sentences? from transformers import BertTokenizer tokenizer =…

word-embedding huggingface-transformers huggingface-tokenizers

asked Jul 01 '20 at 03:32

Lei Hao

708
1
7
21

13

votes

2 answers

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be…

word-embedding bert-language-model huggingface-tokenizers sentence-transformers

asked Dec 23 '20 at 05:34

neha tamore

181
1
1
8

10

votes

3 answers

HuggingFace AutoModelForCasualLM "decoder-only architecture" warning, even after setting padding_side='left'

I'm using AutoModelForCausalLM and AutoTokenizer to generate text output with DialoGPT. For whatever reason, even when using the provided examples from huggingface I get this warning: A decoder-only architecture is being used, but right-padding was…

python machine-learning huggingface-transformers huggingface-tokenizers

asked Dec 09 '22 at 20:39

TurboToaster33

101
1
4

10

votes

3 answers

Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier

I'm trying to get the sentiments for comments with the help of hugging face sentiment analysis pretrained model. It's returning error like Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512)…

deep-learning nlp sentiment-analysis huggingface-transformers huggingface-tokenizers

asked Apr 05 '21 at 14:33

Nithin Reddy

580
2
8
18

8

votes

4 answers

Facing SSL Error with Huggingface pretrained models

I am facing below issue while loading the pretrained model from HuggingFace. HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /roberta-base/resolve/main/config.json (Caused by SSLError(SSLCertVerificationError(1,…

python-3.x tensorflow2.0 huggingface-transformers huggingface-tokenizers

asked Mar 31 '22 at 12:19

chaitu

1,036
5
20
39

8

votes

1 answer

what is so special about special tokens?

what exactly is the difference between "token" and a "special token"? I understand the following: what is a typical token what is a typical special token: MASK, UNK, SEP, etc when do you add a token (when you want to expand your vocab) What I…

nlp tokenize huggingface-transformers bert-language-model huggingface-tokenizers

asked Mar 30 '22 at 14:58

ShaoMin Liu

93
1
6

Questions tagged [huggingface-tokenizers]