For each doc, the bag of words model has a set of sparse features. For example (use your first sentence in your example):
OneWord
AnotherWord
AndSoOn
The above three are the three active
features for the document. It is sparse because we never list those inactive
features explicitly AND we have a very large vocabulary (all possible unique words that you consider as features). In another words, we did not say:
OneWord
AnotherWord
AndSoOn
FirstWordNewSentence: false
We only include those words that are "true".
How many dimensions does my data have?
Is it the number of entries in the largest vector? Or is it the number of unique words? Or something else?
If you stick with the sparse feature representation, you might want to estimate the average number of active features per document instead. That number is 2.5 in your example ((3+2)/2 = 2.5).
If you use a dense representation (e.g., one-hot encoding, it is not a good idea though if the vocabulary is large), the input dimension is equal to your vocabulary size.
If you use a word embedding that has 100-dimension and combine all words' embedding to form a new input vector to represent a document, your input dimension is 100 then. In this case, you convert your sparse features into dense features via the embedding.