I have a set of data in a text file and i would like to build a frequency table based on pre-defined words(drive,street,i,lives). below is the example
ID | Text
---|--------------------------------------------------------------------
1 | i drive to work everyday in the morning and i drive back in the evening on main street
2 | i drive back in a car and then drive to the gym on 5th street
3 | Joe lives in Newyork on NY street
4 | Tod lives in Jersey city on NJ street
Here i what i would like to get as output
ID | drive | street | i | lives
----|--------|----------|------|-------
1 | 2 | 1 | 2 | 0
2 | 2 | 1 | 1 | 0
3 | 0 | 1 | 0 | 1
4 | 0 | 1 | 0 | 1
Here is my code that i'm using and i can find the number of words but this does not solve the need for me and i would like to use a set of pre-defined words to find the counts as shown above
from nltk.corpus import stopwords
import string
from collections import Counter
import nltk
from nltk.tag import pos_tag
xy = open('C:\Python\data\file.txt').read().split()
q = (w.lower() for w in xy)
stopset = set(stopwords.words('english'))
filtered_words = [word for word in xyz if not word in stopset]
filtered_words = []
for word in xyz:
if word not in stopset:
filtered_words.append(word)
print(Counter(filtered_words))
print(len(filtered_words))