0

I have a list of strings as:

string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']

and a list of words as:

words=['hope','court','mention','maryland']

Now, all I want to get the count of list words occurance within list of strings into seperate dictionary with key as 'doc_(index) and values as nested dictionary with key as occured words and value as counts. Output expected as:

words_dict={'doc_1':{'court':2,'hope':1},'doc_2':{'court':1,'hope':1},'doc_3':{'mention':1,'hope':1,'maryland':1}}

what I did first step as:

docs_dict={}
count=0
for i in string_list:
    count+=1
    docs_dic['doc_'+str(count)]=i
print (docs_dic)

{'doc_1': 'philadelphia court excessive disappointed court hope', 'doc_2': 'hope jurisdiction obscures acquittal court', 'doc_3': 'mention hope maryland signal held problem internal reform life bolster level grievance'}

After this, I'm not able to get how I can get the word counts. What I did so far as:

docs={}
for k,v in words_dic.items():
    split_words=v.split()
    for i in words:
        if i in split_words:
            docs[k][i]+=1
        else:
            docs[k][i]=0
Learner
  • 800
  • 1
  • 8
  • 23

4 Answers4

1

You can use count in python to get the word count in a sentence.

Check this code:

words_dict = {}
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words_list=['hope','court','mention','maryland']
for i in range(len(string_list)): #iterate over string list
    helper = {} #temporary dictionary
    for word in words_list: #iterate over word list
        x = string_list[i].count(word) #count no. of occurrences of word in sentence
        if x > 0:
            helper[word]=x
    words_dict["doc_"+str(i+1)]=helper #add temporary dictionary into final dictionary

#Print dictionary contents
for i in words_dict:
    print(i + ": " + str(words_dict[i]))

The output of the above code is:

doc_3: {'maryland': 1, 'mention': 1, 'hope': 1}                                                                                                                                     
doc_2: {'court': 1, 'hope': 1}                                                                                                                                                      
doc_1: {'court': 2, 'hope': 1}
Light Yagami
  • 961
  • 1
  • 9
  • 29
  • Can you please explain what I did wrong in my second piece of code? – Learner Jul 08 '19 at 07:30
  • 1
    @Learner Your code is not clear. Please correct code properly. In the first part, you defined the dictionary as 'docs_dict' and used 'docs_dic'. And in the second part of the code, In the end,at 'docs[k][i]+=1', you're updating dictionary without initializing any value. That's the issue. – Light Yagami Jul 08 '19 at 07:45
0

Use Counter to get count of words in each document.

Try this,

>>> from collections import Counter
>>> string_list = ['philadelphia court excessive disappointed court hope', 'hope jurisdiction obscures acquittal court', 'mention hope maryland signal held problem internal reform life bolster level grievance']
>>> words=['hope','court','mention','maryland']
>>> d = {}
>>> for i,doc in enumerate(string_list):
        for word,count in Counter(doc.split()).items():
            if word in words:
                d.setdefault("doc_{}".format(i), {})[word]=count

Output:

>>> d
{'doc_0': {'court': 2, 'hope': 1}, 'doc_1': {'hope': 1, 'court': 1}, 'doc_2': {'mention': 1, 'hope': 1, 'maryland': 1}}
shaik moeed
  • 5,300
  • 1
  • 18
  • 54
0

It looks like the question here can help.

The below is my attempt at the code that will do what you need.

from collections import Counter
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words=['hope','court','mention','maryland']


result_dict = {}

for index, value in enumerate(string_list):
     string_split = value.split(" ")
     cntr = Counter(string_split)
     result = { key: cntr[key] for key in words }
     result_dict['doc'+str(index)] = result


Hope you find it useful.

  • What is the use of `string_list_list = [x.split(" ") for x in string_list]`? – Learner Jul 10 '19 at 12:07
  • This is a list comprehension. In this case it creates a list of 'sub' lists. The sub lists are created from the words in each sentence. For example the first sub list created would be `['philadelphia', 'court' 'excessive', 'disappointed' , 'court', 'hope']`. – Col Bates - collynomial Jul 15 '19 at 07:21
  • 1
    you are right! deleted. Sorry it was easier to do it with the enumerator in the end. I left it in by accident, – Col Bates - collynomial Jul 15 '19 at 09:55
0

Try this,

from collections import Counter

string_list = ['philadelphia court excessive disappointed court hope',
               'hope jurisdiction obscures acquittal court',
               'mention hope maryland signal held problem internal reform life bolster level grievance']
words = ['hope', 'court', 'mention', 'maryland']

result = {f'doc_{i + 1}': {key: value for key, value in Counter(string_list[i].split()).items() if key in words} for i in range(len(string_list))}
print(result)

output:

{'doc_1': {'court': 2, 'hope': 1}, 'doc_2': {'hope': 1, 'court': 1}, 'doc_3': {'mention': 1, 'hope': 1, 'maryland': 1}}
Kushan Gunasekera
  • 7,268
  • 6
  • 44
  • 58