Given a database containing phrases
Example:
check work slow
work wallpapers
work needed reply notification working groups
I need to calculate the information gain for each distinct word.
- IG('work')
- IG('check')
- ....
I studied the concepts of entropy and information gain but I'm not sure how to apply it in phrases. I saw this link: https://mariuszprzydatek.com/2014/10/31/measuring-entropy-data-disorder-and-information-gain/ But in my case I have no phrase categories. I need to know which words has greatest infogain given only the phrases.