0

How can I find out how many keywords from a file are also in another file? I have a file containing a list of words, and I'm trying to figure out if those words are in another file.

I have a file (keywords.txt) containing key words, and I'm trying to find out if another file contains (tweets.txt) which contains sentences, contains any of the keywords

def main() :
   done = False
   while not done:
        try:
            keywords = input("Enter the filename titled keywords: ")
            with open(keywords, "r") as words:
                done = True
        except IOError:
            print("Error: file not found.")

total = 0
try:
    tweets = input("Enter the file Name titled tweets: ")
    with open(tweets, 'r') as tweets:
except IOError:
    print("Error: file not found.")

def sentiment_of_msg(msg_words_counter):
        summary = 0
        for line in tweets:
                if happy_dict in line:
                    summary += 10 * **The number of keywords in the sentence of the file**
                elif veryUnhappy_dict in line:
                    summary += 1 * quantity 
                elif neutral_dict in line:
                    summary += 5 * quantity
            return summary
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • first read text from files to memory. Now you open files but later you do nothing with this files. And later you will do calculations. – furas Nov 09 '16 at 20:56
  • 1
    No one wants to do your homework for you, for many reasons. Ask a specific question to solve one part of your problem. Right now you're not even close. What happens after `with open(tweets, 'r') as tweets:`? – Alex Hall Nov 09 '16 at 20:57
  • @AlexHall If you're not going to make any suggestions or provide help, Id appreciate if you didn't comment. Thanks! – HelloWorld4382 Nov 09 '16 at 20:59
  • 1
    Haha you're not helping your case here at all. I am giving you a suggestion. Narrow down your question to one specific problems. If you are having issues reading files, ask a question about that. If you feel confident about files, take files out of your question and replace them with a static list of strings. – Alex Hall Nov 09 '16 at 21:02

1 Answers1

1

I'm sensing that this is homework so the best I can do is give you an outline for the solution.

If you can afford to load files in memory:

  • Load keywords.txt, read its lines, split them into tokens and construct a set from them . Now you have a data structure capable of fast membership queries (ie you can ask if token in set and get an answer in constant time.
  • Load the tweets file as you did with keywords, and read its contents line by line (or however they are formatted). You might need to do some preprocessing (stripping whitespace, replacing unnecessary characters, delete invalid words, commas etc). For every line, split it so you get the words for each tweet and ask if any of the splitted words are in keywords set.

Pseudocode would look like this:

file=open(keywords)
keywords_set=set()
for token in file.readlines():
    for word in token.split():
        keywords_set.add(word)

file=open(tweets)
for token in file.readlines():
   preprocess(token) #function with your custom logic
   for item in token.split():
       if item in keywords:
           do_stuff() #function with your custom logic

If you want frequency of keywords, build a dictionary with {key:key_frequency}. Or check out Counter and think about how you could solve your problem with this.

If you cannot load the tweets file into memory consider a lazy solution for reading the big file using generators

Community
  • 1
  • 1
themistoklik
  • 880
  • 1
  • 8
  • 19