0

I'm trying to use the Loughran/McDonald dictionary to classify the tone of financial texts.

Here is a code I found online:

# Get tone dictionary

import re

with open('lmdict.txt') as list:
    lines = list.readlines()
dict = {}
for l in lines:
    if l[0:2] == '>>':
        cat = l[2:].strip()
        dict[cat] = []
    else:
        l = l.strip()
        if l:
            dict[cat].append(l)

# Set up regular expressions
regex = {}
for cat in dict.keys():
    pattern = '\\b(?:' + '|'.join(dict[cat]) + ')\\b'
    regex[cat] = re.compile(pattern, re.IGNORECASE)

# Get tone count
text = "Bsp.text"

wordcount = len(text.split())
for cat in count.keys():
    count[cat] = len(regex[cat].findall(text))
print(count)

Few errors occured before, so I added import re and text = "Bsp.text" to assign the document which I'd like to classify as the variable text (I hope I did it correctly?). Unfortunately, there is another error now:

Traceback (most recent call last):
  File "C:\Users\M\Desktop\Python34\xWordlist.py", line 25, in <module>
    for cat in count.keys():
NameError: name 'count' is not defined

How can I fix this? I'm new to Python, so if there is any other mistake in the code, please let me know. I'd really appreciate it!

UPDATE: I changed the last part of the code, it is working now:

# Get tone count

with open('Bsp.txt', 'r') as content_file:
    content = content_file.read()


count = {}
wordcount = len(content.split())
for cat in dict.keys():
    count[cat] = len(regex[cat].findall(content))

print(count)
M. Civ
  • 21
  • 4
  • That's right, it's not defined. It's not clear why you thought it would be; where do you expect `count` to come from? – jonrsharpe Aug 01 '17 at 09:28
  • 1
    I think the variable name is wrong. You never assigned `count`. The variable assigned is `wordcount`. Where is `count` expected to be assigned? – skjoshi Aug 01 '17 at 09:29
  • Thank you. I edited the last part of the code. Yet, the output is: {'negative': 0, 'positive': 0}. Why is it still 0? There are definitely negative and positive words from the dictionary in the text... What do I have to add to make it count? @skjoshi – M. Civ Aug 02 '17 at 07:28
  • @M.Civ if you have a different problem, it should be a different question. – Baldrickk Aug 02 '17 at 09:03
  • ^ and there are a number of other problems. To begin with, look up how to read files. – Baldrickk Aug 02 '17 at 09:07
  • @Baldrickk Well, I will heed your advice. Would be happier if my code worked though. :) – M. Civ Aug 02 '17 at 09:33

1 Answers1

0

Your count variable is never assigned... Perhaps you mean:

count = {}
for cat in dict.keys():
  ...

Also, I do not see any increment of your count variables. Perhaps:

count[cat] = len(regex[cat].findall(text))

Should be:

if cat not in count:
  count[cat] = 0
count[cat] += len(regex[cat].findall(text))

I added the '+' before the '=' sign...

Note: using dict as a name of a variable is not the best thing to do, as it might lead to unintended consequence, at best it will confuse the reader. dict is a built-in class to represent dictionaries.

ant1g
  • 969
  • 9
  • 13
  • 1
    you might want to note that "dict" is noto a good name to use, as it is the name of a python built-in. – Baldrickk Aug 01 '17 at 11:01
  • Thank you. I edited the last part of the code. Yet, the output is: {'negative': 0, 'positive': 0}. Why is it still 0? There are definitely negative and positive words from the dictionary in the text... What do I have to add to make it count? @Baldrickk – M. Civ Aug 02 '17 at 07:27
  • @M.Civ I edited my answer... I think you missed the increment. – ant1g Aug 02 '17 at 08:08
  • @aramaki I was thinking more of: `dict = dict(); a=dict()` resulting in `TypeError: 'dict' object is not callable` - a bit worse than confusing the user. – Baldrickk Aug 02 '17 at 08:35
  • I edited the question, so you can see the 'new' code. There is another error now... do you have any idea how to solve this? – M. Civ Aug 02 '17 at 08:53
  • @M.Civ is your 'regex' still a dictionary? – ant1g Aug 02 '17 at 08:59
  • @aramaki yes it is – M. Civ Aug 02 '17 at 09:35
  • @M.Civ my proposed solution was wrong. I edited it, the counts should be first initialized. However, I am not sure this is why you got that exception... – ant1g Aug 02 '17 at 09:40
  • @aramaki Ok, thank you for the help though! I appreciate it – M. Civ Aug 02 '17 at 09:45
  • @M.Civ also, what is 'Bsp.txt'? Is it a file name? If so, note that you are not actually reading the file content. You are reading the 'Bsp.txt' as a string. Not sure what you want. – ant1g Aug 02 '17 at 09:47
  • @aramaki yes, it is the file name. How do i read the content then? Sorry, I am really new to programming. Is it with open("filename", 'r')? – M. Civ Aug 02 '17 at 09:49
  • @M.Civ yes: https://stackoverflow.com/questions/7409780/reading-entire-file-in-python – ant1g Aug 02 '17 at 09:53