I'm trying to use the Loughran/McDonald dictionary to classify the tone of financial texts.
Here is a code I found online:
# Get tone dictionary
import re
with open('lmdict.txt') as list:
lines = list.readlines()
dict = {}
for l in lines:
if l[0:2] == '>>':
cat = l[2:].strip()
dict[cat] = []
else:
l = l.strip()
if l:
dict[cat].append(l)
# Set up regular expressions
regex = {}
for cat in dict.keys():
pattern = '\\b(?:' + '|'.join(dict[cat]) + ')\\b'
regex[cat] = re.compile(pattern, re.IGNORECASE)
# Get tone count
text = "Bsp.text"
wordcount = len(text.split())
for cat in count.keys():
count[cat] = len(regex[cat].findall(text))
print(count)
Few errors occured before, so I added import re and text = "Bsp.text" to assign the document which I'd like to classify as the variable text (I hope I did it correctly?). Unfortunately, there is another error now:
Traceback (most recent call last):
File "C:\Users\M\Desktop\Python34\xWordlist.py", line 25, in <module>
for cat in count.keys():
NameError: name 'count' is not defined
How can I fix this? I'm new to Python, so if there is any other mistake in the code, please let me know. I'd really appreciate it!
UPDATE: I changed the last part of the code, it is working now:
# Get tone count
with open('Bsp.txt', 'r') as content_file:
content = content_file.read()
count = {}
wordcount = len(content.split())
for cat in dict.keys():
count[cat] = len(regex[cat].findall(content))
print(count)