I'm trying to understand how to prepare paragraphs for ELMo vectorization.
The docs only show how to embed multiple sentences/words at the time.
eg.
sentences = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
elmo(
inputs={
"tokens": sentences,
"sequence_len": [6, 5]
},
signature="tokens",
as_dict=True
)["elmo"]
As I understand, this will return 2 vectors each representing a given sentence. How would I go about preparing input data to vectorize a whole paragraph containing multiple sentences. Note that I would like to use my own preprocessing.
Can this be done like so?
sentences = [["<s>" "the", "cat", "is", "on", "the", "mat", ".", "</s>",
"<s>", "dogs", "are", "in", "the", "fog", ".", "</s>"]]
or maybe like so?
sentences = [["the", "cat", "is", "on", "the", "mat", ".",
"dogs", "are", "in", "the", "fog", "."]]