1

I have a DataFrame with term frequencies (tf). The columns are words and the rows are documents. The rows sum up to 1.

|   A   |   B   |  C   |
------------------------
| 0.12  | 0.18  | 0.7  |
| 0.1   | 0.8   | 0.1  |
| 0.6   | 0.4   | 0.   |

What is the best / easiest way to weight these values with idf (inverse document frequencies)?

The thing is, tfidf of sklearn doesn't expect term frequencies, but word counts...

ScientiaEtVeritas
  • 5,158
  • 4
  • 41
  • 59
  • The easiest thing would be to pass df.values to an sklearn classifier... – cs95 Jul 11 '17 at 09:13
  • 1
    Possible duplicate of [What is the simplest way to get tfidf with pandas dataframe?](https://stackoverflow.com/questions/37593293/what-is-the-simplest-way-to-get-tfidf-with-pandas-dataframe) – Tiago Martins Peres Jul 11 '17 at 09:15

1 Answers1

0

If you define idf as:

IDF(term, Documents)= |Documents|/(1 +|documents where tf(term)>0|)

you can easily calculate the IDF value of a term by using:

df[df['term'] > 0] / (1 + len(df['term'])
AndreyF
  • 1,798
  • 1
  • 14
  • 25