0

I'm trying to run the following code in python in order to count the keywords in the specific values of my dictionary. Suppose my keywords = ['is', 'my'] and it works fine for me but when my keywords are keywords = ['is', 'my name'] then it doesn't count the keyword my name. I don't know what mistake I'm doing. if anyone can see the code and help me out. thank you

from collections import Counter
import json 
from typing import List, Dict


keywords = ['is', 'my name']

def get_keyword_counts(text: str, keywords: List[str]) -> Dict[str, int]:
    return {
        word: count for word, count in Counter(text.split()).items()
        if word in set(keywords)
    }

    data = {
        "policy": {
            "1": {
                "ID": "ML_0",
                "URL": "www.a.com",
                "Text": "my name is Martin and here is my code"
            },
            "2": {
                "ID": "ML_1",
                "URL": "www.b.com",
                "Text": "my name is Mikal and here is my code"
            }
        }
    }
    
    for policy in data['policy'].values():
        policy.update(get_keyword_counts(policy['Text'], keywords))
    print(json.dumps(data, indent=4))

ZA09
  • 71
  • 6
  • text.split() splits at every space. For example 'foo my word'.split() gives ['foo', 'my', 'word'] not ['foo', 'my word']: so you'll never get 'my word' in your Counter. – slothrop Jul 07 '22 at 10:25
  • @slothrop What could be the possible solution please? Thank you – ZA09 Jul 07 '22 at 10:30
  • Some ideas here: https://stackoverflow.com/questions/4664850/how-to-find-all-occurrences-of-a-substring. In your case, you probably care about word boundaries (you want to match "my word" but not "scammy wordles"), and an approach based on regular expressions would work well for this. – slothrop Jul 07 '22 at 10:39

2 Answers2

2

The substring "my name" is also splitted in get_keyword_counts so there is no actual value "my name", they are apart: "my" and "name". I guess you want to count it as a whole, so there is what you need:

def get_keyword_counts(text: str, keywords: List[str]) -> Dict[str, int]:
    return {
        word: text.count(word) for word in keywords
    }
MercifulSory
  • 337
  • 1
  • 14
1

You are using text.split(), which eventually splits "my" and "name" separately, so instead use count() and that should do it.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257