Highest Voted 'language-model' Questions

21

votes

4 answers

word2vec - what is best? add, concatenate or average word vectors?

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the…

asked Oct 23 '17 at 12:44

Lemon

1,394
3
14
24

20

votes

5 answers

How to compute skipgrams in python?

A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python? Following is the code i tried but it is not doing as…

python nlp n-gram language-model

asked Aug 06 '15 at 05:44

stackit

3,036
9
34
62

18

votes

2 answers

Character-Word Embeddings from lm_1b in Keras

I would like to use some pre-trained word embeddings in a Keras NN model, which have been published by Google in a very well known article. They have provided the code to train a new model, as well as the embeddings here. However, it is not clear…

machine-learning nlp keras language-model word-embedding

asked May 31 '17 at 01:19

chase

3,592
8
37
58

18

votes

3 answers

ARPA language model documentation

Where can I find documentation on ARPA language model format? I am developing simple speech recognition app with pocket-sphinx STT engine. ARPA is recommended there for performance reasons. I want to understand how much can I do to adjust my…

nlp speech-recognition cmusphinx sphinx4 language-model

asked May 06 '13 at 22:14

Lukasz

19,816
17
83
139

17

votes

2 answers

Building openears compatible language model

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful. The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a…

iphone speech-recognition language-model

asked Mar 07 '11 at 14:08

harshalb

6,012
13
56
92

14

votes

2 answers

Creating ARPA language model file with 50,000 words

I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words?

speech-recognition cmusphinx n-gram language-model

asked Apr 21 '11 at 11:24

Vipin

4,718
12
54
81

12

votes

1 answer

TensorFlow Embedding Lookup

I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN As per what was advised, I had taken some time to understand how word IDs are…

tensorflow word2vec recurrent-neural-network language-model

asked Jun 18 '16 at 14:15

VM_AI

1,132
4
13
25

11

votes

2 answers

NLTK package to estimate the (unigram) perplexity

I am trying to calculate the perplexity for the data I have. The code I am using is: import sys sys.path.append("/usr/local/anaconda/lib/python2.7/site-packages/nltk") from nltk.corpus import brown from nltk.model import NgramModel from…

python-2.7 nlp nltk n-gram language-model

asked Oct 21 '15 at 18:48

Ana_Sam

469
2
4
12

10

votes

2 answers

Python interface to ARPA files

I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc. I don't need to generate the ARPA file in Python, only to use it for querying. Does…

python nlp n-gram language-model

asked May 26 '14 at 04:05

Beka

725
6
22

8

votes

1 answer

calculate perplexity in pytorch

I've just trained an LSTM language model using pytorch. The main body of the class is this: class LM(nn.Module): def __init__(self, n_vocab, seq_size, embedding_size, …

python nlp pytorch language-model

asked Dec 06 '19 at 07:58

P.Alipoor

178
1
2
11

8

votes

5 answers

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks.…

tensorflow nlp language-model bert-language-model

asked Sep 08 '19 at 20:14

khemedi

774
3
9
19

7

votes

2 answers

Pretraining a language model on a small custom corpus

I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus of medical (or any "type") text, make a language…

deep-learning transfer-learning huggingface-transformers language-model bert-language-model

asked Apr 24 '20 at 19:38

ysig

447
4
18

6

votes

1 answer

Using custom beam scorer in TensorFlow CTC (language model)

Is it possible to customize beam scorer in TensorFlow CTC implementation from Python side? I see this possibility in comment for CTCBeamSearchDecoder C++ class constructor but wonder how to provide this functionality for Python users? Specific issue…

tensorflow language-model

asked Jun 21 '16 at 14:45

Maksym Diachenko

552
1
4
11

5

votes

0 answers

Starcoder finetuning - How to select the GPU and how to estimate the time it will take to finetune

I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance. It's says in the documentation that for training the model, they used 512 Tesla A100 GPUs and it took 24 days. I also saw the model…

deep-learning pytorch huggingface language-model large-language-model

asked Jun 01 '23 at 17:22

Aadesh

403
3
13

5

votes

0 answers

Is there a particular range for good perplexity value in NLP?

I'm fine-tuning a language model and am calculating training and validation losses along with the training and validation perplexities. It s calculated by taking the exponential of the loss, in my program. I'm aware that lower perplexities represent…

deep-learning neural-network nlp language-model perplexity

asked Jun 23 '20 at 03:36

Dilrukshi Perera

917
3
17
31

Questions tagged [language-model]