I am trying to implement tf-idf in python using sklearn.
Here's what I got so far:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
"This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
dic = dict(zip(vectorizer.get_feature_names(), idf))
print dic
Now, when I change my corpus to my original dataset, which is like this:
corpus = [["This is very strange"],
["This is very nice"]]
and code to this:
vectorizer = TfidfVectorizer(min_df=1)
f = list()
for doc in corpus:
X = vectorizer.fit_transform(doc)
idf = vectorizer.idf_
dic = dict(zip(vectorizer.get_feature_names(), idf))
f.append(dic)
print f
It won't work.
So basically, I have multiple documents in 2D List. And originally, I had a 1D list with documents.
Further after calculating tf-idf, I will apply classification on it.
How should I get my tf-idf working?