Python: Nested dictionary - create if key doesn't exist, else sum 1

Question

ESCENARIO
I am trying to count the number of times a word appears in a sentence, for a list of sentences. Each sentence is a list of words.
I want the final dictionary to have a key for each word in the entire corpus, and a second key indicating the sentences in which they appear, with the value being the number of times it appears in it.

CURRENT SOLUTION
The following code works correctly:

dfm = dict()
for i,sentence in enumerate(setences):
    for word in sentence:
        if word not in df.keys():
            dfm[word] = dict()
        if i not in dfm[word].keys():
            dfm[word][i] = 1
        else:
            dfm[word][i] += 1

QUESTION
Is there any cleaner way to do it with python?
I have already gone through this and this where they suggest using:

dic.setdefault(key,[]).append(value)

and,

d = defaultdict(lambda: defaultdict(dict))

I think they are good solution, but I can't figure out how to adapt that to my particular solution.

Thanks !

@temmo Collections Counter is normally a good choice but looking at the structure OP wants I think the defaultdict makes more sense. — Anton vBR, Oct 30 '19 at 14:03
Extra comment: Good work on preparing your question and to search for a possible solution before posting here on SO. You deserve a star! — Anton vBR, Oct 30 '19 at 14:16

score 4 · Answer 1 · edited Oct 30 '19 at 14:13

4

Say you have this input:

sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]

Your solution:

dfm = dict()
for i,sentence in enumerate(sentences):
    for word in sentence:
        if word not in dfm.keys():
            dfm[word] = dict()
        if i not in dfm[word].keys():
            dfm[word][i] = 1
        else:
            dfm[word][i] += 1

Defaultdict int:

from collections import defaultdict

dfm2 = defaultdict(lambda: defaultdict(int))

for i,sentence in enumerate(sentences):
    for word in sentence:
        dfm2[word][i] += 1

Test:

dfm2 == dfm  # True


#{'dog': {0: 1},
# 'is': {0: 1, 1: 1, 2: 1},
# 'big': {0: 1, 1: 1},
# 'cat': {1: 1, 2: 1},
# 'dark': {2: 1}}

edited Oct 30 '19 at 14:13

Anton vBR

18,287
5
40
46

answered Oct 30 '19 at 14:04

rassar

5,412
3
25
41

2

This was exactly the answer I had in mind so I took the liberty to change it to compare it to the original solution. Hope that is ok for you @rassar. – Anton vBR Oct 30 '19 at 14:12
1

@AntonvBR That's great! Thank you! – rassar Oct 30 '19 at 14:13

sahasrara62 · Answer 2 · 2019-10-30T14:14:40.587

1

for cleaner version use Counter

from collections import Counter

string = 'this is america this is america'
x=Counter(string.split())
print(x)

output

Counter({'this': 2, 'is': 2, 'america': 2})

if want some own code then

copying input data (sentence) from @rassar

def func(list_:list):      
    dic = {}
    for sub_list in list_:
        for word in sub_list:
            if word not in dic.keys():
                dic.update({word:1})
            else:
                dic[word]+=1
    return dic


sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]


print(func(sentences))

output

{'dog': 1, 'is': 3, 'big': 2, 'cat': 2, 'dark': 1}

edited Oct 30 '19 at 14:14

answered Oct 30 '19 at 14:04

sahasrara62

10,069
3
29
44

This isn't correct - OP wants a dictionary mapping each word to a dictionary, where the second dictionary maps {sentence number: count of words in that sentence}. – rassar Oct 30 '19 at 14:05
@rassar also added dict mapping – sahasrara62 Oct 30 '19 at 14:08
I would add a `split()` at the beginning of the function so that we can input something fancy like `sentences = ["dog is big", "cat is dark"]` directly. – Guimoute Oct 30 '19 at 14:18
@Guimoute i had added that way you can see that way in previous edits, but Op didn't gave any input data, so i use rassar input data and modify the code :) – sahasrara62 Oct 30 '19 at 14:21

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

Use counters

from collections import Counter
    
sentences = ["This is Day", "Never say die", "Chat is a good bot", "Hello World", "Two plus two equals four","A quick brown fox jumps over the lazy dog", "Young chef, bring whisky with fifteen hydrogen ice cubes"]

sentenceWords = ( Counter(x.lower() for x in sentence.split()) for sentence in sentences)

#print result
print("\n".join(str(c) for c in sentenceWords))

Python: Nested dictionary - create if key doesn't exist, else sum 1

3 Answers3

Use counters