I want to create spaCy doc
given I have raw text and words
but missing whitespaces data.
from spacy.tokens import Doc
doc = Doc(nlp.vocab, words=words, spaces=spaces)
How to do it correctly so information about whitespaces was not lost ? Example of data I have :
data= {'text': 'This is just a test sample.', 'words': ['This', 'is', 'just', 'a', 'test', 'sample', '.']}