How can I get two txt files by finding common occurrences?

Question

I need to know which English words were used in the Italian chat and to count how many times they were used.

But in the output I also have the words I didn't use in the example chat (baby-blue-eyes': 0)

english_words = {}

with open("dizionarioen.txt") as f:
for line in f:
  for word in line.strip().split():
    english_words[word] = 0
    
with open("_chat.txt") as f:
for line in f:
  for word in line.strip().split():
    if word in english_words: 
      english_words[word] += 1

print(english_words)

Welcome to StackOverflow. What did you tried to achieve your goal? — crissal, Jun 11 '21 at 13:55
I don't understand your question. Coud you add some example input and output? — Relandom, Jun 11 '21 at 13:55
Does this answer your question? [How can I count the occurrences of a list item?](https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item) — crissal, Jun 11 '21 at 14:00
Except from the fact that your code will be quite slow for large files and leads to case-sensitive comparison, it should work and give you the occurrences of words in the chat. What exactly is your problem? Where do your need help? — Martin Wettstein, Jun 11 '21 at 14:09

Relandom · Accepted Answer · 2021-06-11T14:46:07.917

0

You can simply iterate over your result and remove all elements that have value 0:

english_words = {}

with open("dizionarioen.txt") as f:
  for line in f:
    for word in line.strip().split():
      english_words[word] = 0

with open("_chat.txt") as f:
  for line in f:
    for word in line.strip().split():
      if word in english_words: 
        english_words[word] += 1

result = {key: value for key, value in english_words.items() if value}
print(result)

Also here is another solution that allows you to count words with usage of Counter:

from collections import Counter

with open("dizionarioen.txt") as f:
    all_words = set(word for line in f for word in line.split())

with open("_chat.txt") as f:
    result = Counter([word for line in f for word in line.split() if word in all_words])

print(result)

edited Jun 11 '21 at 14:46

answered Jun 11 '21 at 14:18

Relandom

1,029
2
9
16

@Victory74 you need to print `result` since we created here a new dictionary. – Relandom Jun 11 '21 at 14:31
@Victory74 I have made an update. It is because I didn't get your question correctly. Try it now – Relandom Jun 11 '21 at 14:44
I would pick the second one. It is faster and cleaner since we are using "Counter" from collections. In your approach, you are creating a lot of data that you later want to remove. In the second you only use data that you really want. – Relandom Jun 11 '21 at 14:53
@Victory74 Sorry but I think you need to write a new question about that. There is simply not enough space in the comment section to put all of your code and right now I don't know what do you mean. Also if this is the correct answer, you can accept it. – Relandom Jun 11 '21 at 15:06

score 0 · Answer 2 · answered Jun 11 '21 at 14:19

0

If you want to remove the words without occurrence after indexing, just delete these entries:

for w in list(english_words.keys()):
    if english_words[w]==0: del english_words[w]

Then, your dictionary only contains words that occurred. Was that the question?

answered Jun 11 '21 at 14:19

Martin Wettstein

2,771
2
9
15

No, the dictionary contains more words than those used in the Italian chat – Victory74 Jun 11 '21 at 14:25

How can I get two txt files by finding common occurrences?

2 Answers2