Highest Voted 'sentencepiece' Questions

13

votes

1 answer

sentencepiece library is not being installed in the system

While using pip install tf-models-official I found the following problem while the library is getting installed:- Collecting tf-models-official Using cached tf_models_official-2.8.0-py2.py3-none-any.whl (2.2 MB) Requirement already satisfied:…

asked Mar 22 '22 at 16:12

Daremitsu

545
2
8
24

8

votes

2 answers

How to add new special token to the tokenizer?

I want to build a multi-class classification model for which I have conversational data as input for the BERT model (using bert-base-uncased). QUERY: I want to ask a question. ANSWER: Sure, ask away. QUERY: How is the weather today? ANSWER: It is…

bert-language-model huggingface-tokenizers sentencepiece

asked Sep 15 '21 at 10:24

sid8491

6,622
6
38
64

5

votes

1 answer

why does huggingface t5 tokenizer ignore some of the whitespaces?

I am using T5 model and tokenizer for a downstream task. I want to add certain whitespaces to the tokenizer like line ending (\t) and tab (\t). Adding these tokens work but somehow the tokenizer always ignores the second whitespace. So, it tokenizes…

huggingface-transformers huggingface-tokenizers sentencepiece

asked May 12 '22 at 11:04

Berkay Berabi

1,933
1
10
26

2

votes

0 answers

SentencePiece tokenizer encodes to unknown token

I am using HuggigFace implementation of SentencePiece tokenizer, i.e., SentencePieceBPETokenizer and SentencePieceUnigramTokenizer classes. I train these tokenizers on dataset which has no unicode characters and then try to encode the string that…

nlp huggingface huggingface-tokenizers sentencepiece byte-pair-encoding

asked Aug 02 '23 at 08:58

Shital Shah

63,284
17
238
185

2

votes

1 answer

Error while converting pth file to ggml.py format

Error: That I'm getting when I try to convert-pth-to-ggml.py Don't know whether the error is in my file management due to which model is unable to load or it is due to OS Traceback (most recent call last): File…

python macos deep-learning tokenize sentencepiece

asked Mar 18 '23 at 16:11

Tanish Shah

39
5

2

votes

1 answer

(OpenNMT) Spanish to English Model Improvement

I’m currently trying to train a Spanish to English model using yaml scripts. My data set is pretty big but just for starters, I’m trying to get a 10,000 training set and 1000-2000 validation set working well first. However, after trying for days, I…

python machine-learning machine-translation opennmt sentencepiece

asked May 01 '21 at 00:09

Jose Chavez

115
9

1

vote

1 answer

libsentencepiece.so.0: cannot open shared object file: No such file or directory when creating BERTopic model

I am trying to train a BERTopic Model in python. However, I get this error: RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback): libsentencepiece.so.0: cannot open shared object file:…

conda bert-language-model sentencepiece

asked Jul 14 '23 at 17:54

kmcclenn

127
11

1

vote

0 answers

Got the "Unable to load vocabulary from file." while using pipelines

I have been trying to use the "csebuetnlp/mT5_multilingual_XLSum" model for summarization purposes. The code I tried is listed as below: !pip install transformers !pip install sentencepiece import transformers text_example = """ En düşük emekli…

jupyter-notebook nlp pipeline huggingface-transformers sentencepiece

asked Apr 06 '23 at 10:52

dicloflom

11
1

1

vote

0 answers

how to integrate sentencepiece, protobuf into existing android project correctly

I am trying to integrate pytorch model to process language. This is why I need the sentencepiece to tokenize the sentence chunks. But I am unable to do that correctly. I did not find any robust documentation of integrating sentencepiece into android…

android cmake protobuf-java sentencepiece

asked Apr 03 '23 at 04:23

im07

386
2
12

1

vote

1 answer

Saving SentencepieceTokenizer in Keras model throws TypeError: Failed to convert elements of [None, None] to Tensor

I'm trying to save a Keras model which uses a SentencepieceTokenizer. Everything is working so far but I am unable to save the Keras model. After training the sentencepiece model, I am creating the Keras model, call it with some examples first and…

python tensorflow keras sentencepiece

asked Aug 02 '22 at 08:55

Stefan Falk

23,898
50
191
378

1

vote

0 answers

Slow and Fast tokenizer gives different outputs(sentencepiece tokenization)

When i use T5TokenizerFast(Tokenizer of T5 architecture), the output is expected as follows: ['▁', '', '▁Hello', '▁', '', ''] But when i use the normal tokenizer, it starts to split special token "/s>" as follows: ['▁',…

nlp tokenize huggingface-tokenizers sentencepiece

asked Jul 30 '22 at 14:13

canP

25
4

1

vote

1 answer

SentencePiece in Google Colab

I want to use sentencepiece, from https://github.com/google/sentencepiece in a Google Colab project where I am training an OpenNMT model. I'm a little confused with how to set up the sentencepiece binaries in Google Colab. Do I need to build with…

google-colaboratory tokenize machine-translation opennmt sentencepiece

asked Apr 29 '21 at 04:45

Jose Chavez

115
9

1

vote

1 answer

How to add new token to T5 tokenizer which uses sentencepieace

I train the t5 transformer which is based on tensorflow at the following link: https://github.com/google-research/text-to-text-transfer-transformer Here is a sample (input, output): input: b'[atomic]:PersonX plays a ___ in the…

python tensorflow nlp sentencepiece

asked Apr 21 '21 at 09:57

Ahmad

8,811
11
76
141

1

vote

0 answers

"OSError: Model name './XX' was not found in tokenizers model name list" - cannot load custom tokenizer in Transformers

I'm trying to create my own tokenizer with my own dataset/vocabulary using Sentencepiece and then use it with AlbertTokenizer transformers. I followed really closely the tutorial on how to train a model from scratch by HuggingFace:…

python pytorch bert-language-model huggingface-transformers sentencepiece

asked Dec 08 '20 at 12:42

tlqn

349
1
6
18

1

vote

1 answer

How can I update sentencepiece package to its latest version using conda?

I have installed conda on linux ubuntu 16. When I install or update a package named sentencepiece it install the version 0.1.85 (which I guess is from 2 months ago according to anaconda website). However the latest version is 0.1.91. I can't install…

python anaconda conda sentencepiece

asked Jul 05 '20 at 10:39

Ahmad

8,811
11
76
141

Questions tagged [sentencepiece]