Let'say I use for a single document
text="bla agao haa"
singleTFIDF = TfidfVectorizer(analyzer='char_wb', ngram_range=
(4,6),preprocessor=my_tokenizer, max_features=100).fit([text])
single=singleTFIDF.transform([text])
query = singleTFIDF.transform(["new coming document"])
If I understand correct, transform just uses the learned weights from fit. So, for the new document, query contains the weights for each feature within the document. It looks like [[0,,0,0.13,0.4,0]]
As I use n-grams, I would like to get the features too for this new document. So I know for the new document the weights to each feature in this document.
EDIT:
in my case I get for single and query the following array:
single
[[0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125 0.10721125 0.10721125 0.10721125
0.10721125 0.10721125 0.10721125]]
query
[[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0.57735027 0.57735027 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0.57735027 0. 0.
0. 0. 0. ]]
But this is strange as from the learned corpus (single) all features have weights of 0.10721125. So how can a feature of the new document has a weight of 0.57735027?