0

I have a file that includes some data,

An example of the data I have

+------------+---------------------------------+-------------------------+
|  SOC Code  |              Title              |  Occupational Category  |
+------------+---------------------------------+-------------------------+
| 11-1011.03 | Chief Sustainability Officers   | New & Emerging          |
| 11-1021.00 | General and Operations Managers | Enhanced Skills         |
+------------+---------------------------------+-------------------------+

I need to find the most frequent words in the file Any ideas on how can this be applied? pieces of codes would be appreciated as an example

Eng.Reem
  • 21
  • 1
  • 8
  • Welcome to stackoverflow. Check out the wikipedia entry on TF-IDF and you'll see that it is not meaningful if you have a single document -- you need a collection of many documents, and TF-IDF chooses among them. You probably need a different metric, and you definitely need a better problem statement. Note that on this site, _you_ give us pieces of code and we help you improve it. – alexis May 27 '17 at 20:15
  • Read this relevant Q: https://stackoverflow.com/q/42269313/7414759 – stovfl May 28 '17 at 11:21
  • This has nothing to do with PyCharm. It's just an editor. You can write a Python program to operate on CSV files in any number of editors. – Chet Jun 07 '17 at 20:56

1 Answers1

0

You could count the words using the NLTK FreqDist method and return the most frequent ones.

lvcasco
  • 45
  • 1
  • 8