I have a pandas dataframe which consists of two strings and one keyword per entry. It looks like this:
\n 05 Temmuz 2016 17:59 \
0 Suriyelilere vatandaşlığa neden karşı çıkılıyor
1 Selin Girit Kendi ülkesinde savaştan kaçacak s...
\n 10 Temmuz 2016 09:01 \
0 Öteki Suriyeliler: Türkiye vatandaşı olursak a...
1 Cumhurbaşkanı Tayyip Erdoğan Suriyelilere vata...
What I'm trying to do is using sci-kit learn get the tf-idf of each word in the second string and compare it to a corpus of general words. But I'm not really sure how to do that. If I use tfidfVectorize() I end up with something that looks like this:
(0, 1) 0.520040083208
(0, 8) 0.307144050546
(0, 5) 0.307144050546
(0, 4) 0.520040083208
(0, 7) 0.520040083208
(1, 8) 0.326309521953
(1, 5) 0.326309521953
(1, 3) 0.420182921489
(1, 2) 0.552490047084
(1, 0) 0.552490047084
(2, 8) 0.294893556078
(2, 5) 0.294893556078
(2, 3) 0.759458290886
(2, 6) 0.499298193039
But this output isn't for every word individually and it's a comparison between words in the dictionary not a general corpus... I'm not sure how to do what I'm looking for, and I was hoping someone might have some advice as the Sci-Kit Learn documentation isn't very clear.