Generate ngram (bigram or trigram) in Keras / Tensorflow

Question

I want to generate generate n-grams from a sequence of tokens:

bigram:: "1 3 4 5" --> { (1,3), (3,4), (4,5) }

After searching I found this thread that used:

def find_ngrams(input_list, n):
  return zip(*[input_list[i:] for i in range(n)])

If I use this piece of code during my training time I think it kills the performance. So I looking for a better option.

Amir · Accepted Answer · 2022-01-28T09:28:03.907

2

If you need to generate bigram in string format:

import tensorflow as tf

tf.enable_eager_execution()

sentence = ['this is example sentence']
x = tf.string_split(sentence).values[:-1] + ' ' + tf.string_split(sentence).values[1:]

# tf.Tensor([b'this is' b'is example' b'example sentence'], shape=(3,), dtype=string)

You can also use tensorflow-transform to generate ngrams.

import tensorflow_transform as tft

tft.ngrams(tensor, (1,2), " ")

Note: tensorflow-transform only supports python 2 until 22 January 2019.

edited Jan 28 '22 at 09:28

answered Nov 14 '17 at 17:34

Amir

16,067
10
80
119

1

The added bonus w/ these tf-transform ops is that they are driven by core graph ops, so they work outside of python! At least w/ my small experiment w/ `ngrams`... – eggie5 Oct 21 '19 at 02:49

Generate ngram (bigram or trigram) in Keras / Tensorflow

1 Answers1