I am trying to extract the information from four different text files with several keywords. I want to extract these keywords and attach the word frequency to the keywords. The text files look like this:
test1 = apple banana lemon
test2 = apple banana
test3 = lemon apple lemon
test4 = apple lemon grape
I think there is an issue in the bolded code (second paragraph), I am not sure about how I should construct the initial dictionaries.
test1= [line.rstrip('\n') for line in open("test1.txt")]
test2= [line.rstrip('\n') for line in open("test2.txt")]
test3= [line.rstrip('\n') for line in open("test3.txt")]
test4= [line.rstrip('\n') for line in open("test4.txt")]
**
text_file = test1, test2, test3, test4
word_frequencies = 0
text_collection = {}
**
def dictionary(text):
keywords = re.split(r'\W', text)
print(text)
word_frequencies = dict()
for word in keyword:
if word in word_frequences:
word_frequences[word] += 1
else:
word_frequencies[word] = 1
return word_frequencies
for all in text_file:
file = open(all)
text = file.read()
print(file)
text_collection[all] = dictionary(text)
print(text_collection)
Desired output:
{'test1.txt': {'apple': 1, 'banana': 1, 'lemon': 1},
'test2.txt': {'apple': 1, 'banana': 1},
'test3.txt': {'apple': 1, 'lemon': 2},
'test4.txt': {'apple': 1, 'lemon': 1, 'grape': 1}}
I would rather not use imported libraries as the answers. This code is more for practice than efficiency :)