How do I fix my code so that it does repeat count a word?

Question

So, my code reads a poem from a file. It then counts the number of time a word occurs and adds to a dictionary. However, my code repeats the words and counts separately.

Here is my code:

def unique_word_count(file):
    words = open(file, "r")
    lines = words.read()
    words = lines.split()
    counts = dict()

    for word in words:
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1

    return counts

For, example if we have the string, "Hi, hi, hi how are you": The output for my code would come out as:

{"Hi":1, "hi":1, "hi":1, "how":1, "are":1, "you":1}

While it should come out as:

{"hi":3, "how":1, "are":1, "you":1}

How can I fix my code so that it does not repeat words? Thank you!

Note that indenting only the first line doesn't format your code correctly. Use code blocks (indent ALL code by 4 spaces) or code fences (three backticks `\`\`\`` on the lines before and after your code) [Formatting help](/help/formatting) — Pranav Hosangadi, Dec 14 '22 at 04:14
The output you claim to obtain is impossible -- Keys in a dict are unique, so it is not possible for a dict to have two `"hi"` keys. You probably got `{"Hi": 1, "hi":2, "how":1, "are":1, "you":1}`, but if you truly understand how your code works, combining those two `"hi"` values should be trivially easy using `str.lower` — Pranav Hosangadi, Dec 14 '22 at 04:15
Does this answer your question? [How do I lowercase a string in Python?](https://stackoverflow.com/questions/6797984/how-do-i-lowercase-a-string-in-python) — Pranav Hosangadi, Dec 14 '22 at 04:17
Hello, I understand combining the "Hi" and "hi" but my code actually does counts the two "hi" separately. Not sure why. — flory, Dec 14 '22 at 04:19
Look at your dictionary carefully. The first two keys contain a trailing comma, since `.split()` only splits by spaces. Looks like you need a better definition of what a "word" is. — Pranav Hosangadi, Dec 14 '22 at 04:21
The string is just an example, the actual poem in the file is very long and I don't think, I could post it entirely here but I could show part of the input and output. Part of the poem, "Row, row, row your boat". The output of my code is, "{"Row": 10, "row": 10, "row": 10, "your": 10, "boat":10}. Note: the number count is ten because the poem repeats the string multiple times. — flory, Dec 14 '22 at 04:24
Again, that is not possible. Your dictionary can NOT have two `"row"` keys. It has one `"row"` key, and one `"row,"` key. *Notice the **trailing comma** in the second key*. Now you need to ask how you can remove trailing commas (or periods or other symbols) from a word. — Pranav Hosangadi, Dec 14 '22 at 04:25
as an aside, you're currently not closing the file before the function ends, or before you repurpose the variable `words`. After the line `lines = words.read()` you need to add `words.close()` or change that part of the function to use the file context manager. — nigh_anxiety, Dec 14 '22 at 04:38

Aditya Nagar · Answer 1 · 2022-12-14T04:51:12.597

1

The reason why you're get that output is because of split.

split() splits the sentence in place where there is a space. So in it treats hi, and hi differently. So try using this code

import re

def word_count(str):
    counts = dict()
    words = re.findall(r'\w+', str.lower())
    for word in words:
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1

    return counts

print( word_count('Hi, hi, hi how are you'))

The usual way to find words in a string is to use split, but that can fail, so you need regular expressions to do this.

\w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].

After the findall function filters the string and pulls out the words while ignoring punctuation, it returns the list.

Edit: Added explanation

edited Dec 14 '22 at 04:51

answered Dec 14 '22 at 04:26

Aditya Nagar

11
3

A good answer should also include what it is that makes this code work over the original – Pranav Hosangadi Dec 14 '22 at 04:27
Added the explanation – Aditya Nagar Dec 14 '22 at 04:31
_"..., but this can fail_": in what situation? _"findall function filters the string"_: Not really, `findall` finds all matches to the given regex. Why did you give it that regex though? That is the important idea in your answer that needs explanation. – Pranav Hosangadi Dec 14 '22 at 04:37

How do I fix my code so that it does repeat count a word?

1 Answers1