open .txt file and place every word within a dictionary

Question

I wish to open a .txt file and enter all of the words within the file into a dictionary. After I would like to accumulate the total sum of words within the dictionary.

The .txt file contains 5 lines:

elephant calculator fish
towel onion fish
nandos pigeon tiger
cheeky peg lion
dog cat fish

this is what I currently have:

words = 0 
dictionary = []
with open('file.txt','r') as file:
    for x in inf:
        dictionary.split(x)
        words += 1
print(words)

Sorry about the awfully constructed question.

Sorry but this is not a service that produces code. What have you done so far? — putvande, Mar 13 '16 at 19:49
Oh a newbie, quick everyone beat them down with passive aggressive comments so they learn not to ask again! Better not provide supportive and constructive feedback, that is below us gods of software. — Nick Isaacs, Mar 13 '16 at 19:53
You need to post your best effort at solving the problem and then we can work with that code. That gives us specific code related problems to work with. — tdelaney, Mar 13 '16 at 20:00
@zondo I have edited the question with what I currently have, in the future I will provide more information — my name jeff, Mar 13 '16 at 20:01
Yeah code, that's what we needed. You created a list not a dict but that's good because a list is what you want. — tdelaney, Mar 13 '16 at 20:01
Now there's just one more thing needed: what is wrong with what you have? Does it throw some errors or does it come up with the wrong result? — zondo, Mar 13 '16 at 20:03
@ I am currently getting a "**syntax error near unexpected token `words'**", before I had it so that it would take every line and insert it into a dictionary, but this is not what I wanted, as I want to count the amount of words, not lines. It came up as _count = 5_ — my name jeff, Mar 13 '16 at 20:05
There's so much wrong here (list != dictionary, `inf` not defined, `split` on a list). @mynamejeff, it looks like you're trying to run before you can walk. There's loads of good tutorials online; I advice you spend a few hours on the basics first — Alastair McCormack, Mar 13 '16 at 20:08
Why do you need a dictionary? They are key:value pairs which means that you need to associate two things about the words. What is it you want to track with the word. For instance `"elephant":what?` — tdelaney, Mar 13 '16 at 20:12
Are you interested in the number of times the word appears in the text? — tdelaney, Mar 13 '16 at 20:13
@tdelaney yeah, but duplicate words that show up later within the text file should not be counted — my name jeff, Mar 13 '16 at 20:17
You want to count them... but not count duplicates later in the file... then the count will always only be 1. Its easy to do, and its also easy to count all occurances, but I need to know which way to go. — tdelaney, Mar 13 '16 at 20:19

PM 2Ring · Accepted Answer · 2016-03-13T20:47:30.930

The simple way to get a count of unique words is to use a set. I put your text into a file called 'qdata.txt'.

The file is very small, so there's no need to read it line by line: just read the whole thing into a single string, then split that string on whitespace and pass the resulting list into the set constructor:

fname = 'qdata.txt'
with open(fname) as f:
    words = set(f.read().split())
print(words, len(words))

output

set(['towel', 'onion', 'nandos', 'calculator', 'pigeon', 'dog', 'cat', 'tiger', 'lion', 'cheeky', 'elephant', 'peg', 'fish']) 13

This works because "a set object is an unordered collection of distinct hashable objects". If you try to add a duplicate item to a set it's simply ignored. Please see the docs for further details.

For larger files, it is a good idea to read and process them line by line to avoid loading the whole file into RAM, but with modern OSes the file needs to be rather large before you see any benefit, due to file caching.

fname = 'qdata.txt'
words = set()
with open(fname) as f:
    for line in f:
        words.update(line.split())

print(words, len(words))

tdelaney · Answer 2 · 2016-03-13T20:26:34.880

You have several problems but the basic strategy is sound

dictionary is actually a list... which is what you want anyway. Rename it.
you opened the file as file which is fine in Python 3 but frowned upon in Python 2 because it masks the builtin file object. People are still sensitive about that, so best use a different name.
You didn't use the file variable, instead inventing something called inf.
You split the wrong thing. You want to split the x line you read from the file.
No need to count the words.... lists know how long they are.

So, this would work better

words = []
with open('file.txt') as fileobj:
    for x in fileobj:
        words += x.strip().split()
print(len(words))

collections.Counter is often used to count occurances of words. Assuming you can use anything in the standard lib, this would work (notice I lower cased so that Elephant and elephant count the same):

import collections
words = collections.Counter(int)
with open('file.txt') as fileobj:
    for x in fileobj:
        words.update(word.lower() for word in x.strip().split())
# words is a dict-like object with a count of each word
print(len(words))
print(words)
# lets pick one
print('elephant count', words['elephant'])

thanks, how would you modify this so that duplicate words are not included in the total? e.g. "fish", "fish", "cow" word count = 2 — my name jeff, Mar 13 '16 at 20:14

score -1 · Answer 3 · answered Mar 13 '16 at 20:34

-1

This may be inefficient, and never used in a case like this, but being as I am new as well, I wonder why the following wouldn't work for removing duplicates.

words = []
with open('file.txt') as fileobj:
    for x in fileobj:
        words += x.strip().split()
    for i in words:
        if words.count(i) > 1:
            words.remove(i)
print (len(words))
print (words)

Majority of code thanks to tdelaney.

answered Mar 13 '16 at 20:34

Dan Priest

111
4

1

It's dangerous to remove items from a list that you're iterating over. It's a bit like sawing off a tree branch that you're sitting on. See [Remove items from a list while iterating in Python](http://stackoverflow.com/q/1207406/4014959). Also, `.count` isn't very efficient: it has to perform a linear scan over the entire list on each call. – PM 2Ring Mar 13 '16 at 20:38
Yes, I was a bit hesitant about the removing items from a list that is being iterated over. The rest makes sense as well. Explanation much appreciated. – Dan Priest Mar 13 '16 at 20:41

open .txt file and place every word within a dictionary

3 Answers3