Multiple words in single keyword and counting them in the data in python

Question

I'm trying to run the following code in python in order to count the keywords in the specific values of my dictionary. Suppose my keywords = ['is', 'my'] and it works fine for me but when my keywords are keywords = ['is', 'my name'] then it doesn't count the keyword my name. I don't know what mistake I'm doing. if anyone can see the code and help me out. thank you

from collections import Counter
import json 
from typing import List, Dict


keywords = ['is', 'my name']

def get_keyword_counts(text: str, keywords: List[str]) -> Dict[str, int]:
    return {
        word: count for word, count in Counter(text.split()).items()
        if word in set(keywords)
    }

    data = {
        "policy": {
            "1": {
                "ID": "ML_0",
                "URL": "www.a.com",
                "Text": "my name is Martin and here is my code"
            },
            "2": {
                "ID": "ML_1",
                "URL": "www.b.com",
                "Text": "my name is Mikal and here is my code"
            }
        }
    }
    
    for policy in data['policy'].values():
        policy.update(get_keyword_counts(policy['Text'], keywords))
    print(json.dumps(data, indent=4))

text.split() splits at every space. For example 'foo my word'.split() gives ['foo', 'my', 'word'] not ['foo', 'my word']: so you'll never get 'my word' in your Counter. — slothrop, Jul 07 '22 at 10:25
@slothrop What could be the possible solution please? Thank you — ZA09, Jul 07 '22 at 10:30
Some ideas here: https://stackoverflow.com/questions/4664850/how-to-find-all-occurrences-of-a-substring. In your case, you probably care about word boundaries (you want to match "my word" but not "scammy wordles"), and an approach based on regular expressions would work well for this. — slothrop, Jul 07 '22 at 10:39

score 2 · Accepted Answer · answered Jul 07 '22 at 10:39

2

The substring "my name" is also splitted in get_keyword_counts so there is no actual value "my name", they are apart: "my" and "name". I guess you want to count it as a whole, so there is what you need:

def get_keyword_counts(text: str, keywords: List[str]) -> Dict[str, int]:
    return {
        word: text.count(word) for word in keywords
    }

answered Jul 07 '22 at 10:39

MercifulSory

337
1
14

Say text="grimy nameplate" and word="my name". Then text.count(word)=1: is that what is required? – slothrop Jul 07 '22 at 10:41

score 1 · Answer 2 · edited Jul 07 '22 at 17:03

1

You are using text.split(), which eventually splits "my" and "name" separately, so instead use count() and that should do it.

edited Jul 07 '22 at 17:03

ChrisGPT was on strike

127,765
105
273
257

answered Jul 07 '22 at 11:13

Nishant Modi

11
1

Multiple words in single keyword and counting them in the data in python

2 Answers2