-1

Hi I am new to python and I am trying to figure out how to output frequency probability of a word in given list. I am using from collections import Counter to return the number of frequencies of a word but not sure how I can instead of returning frequency of word,I can return the probability ?

from collections import Counter

text_combined = (
    "Boston Celtics end Denver Nuggets' eight-game winning"
    " run Curry sets Warriors all-timescoring record A"
    " Brooklyn Artist Wants Sports Fans to Wear Their"
    " Names Nikola Jokic, Nuggets beat Spurs for 7th win in"
    " row - Reuters Mikko Rantanen scores twice as Avalanche"
    " down Coyotes - Reuters Phillies' Aaron "
)
p = Counter(text_combined.split())
print(p.most_common(50))

This returns following :

[('-', 2),
 ('Reuters', 2),
 ('Boston', 1),
 ('Celtics', 1),
 ('end', 1),
 ('Denver', 1),
 ("Nuggets'", 1),
 ('eight-game', 1),
 ('winning', 1),
 ('run', 1),
 ('Curry', 1),
 ('sets', 1),
 ('Warriors', 1),
 ('all-timescoring', 1),
 ('record', 1),
 ('A', 1),
 ('Brooklyn', 1),
 ('Artist', 1),
 ('Wants', 1),
 ('Sports', 1),
 ('Fans', 1),
 ('to', 1),
 ('Wear', 1),
 ('Their', 1),
 ('Names', 1),
 ('Nikola', 1),
 ('Jokic,', 1),
 ('Nuggets', 1),
 ('beat', 1),
 ('Spurs', 1),
 ('for', 1),
 ('7th', 1),
 ('win', 1),
 ('in', 1),
 ('row', 1),
 ('Mikko', 1),
 ('Rantanen', 1),
 ('scores', 1),
 ('twice', 1),
 ('as', 1),
 ('Avalanche', 1),
 ('down', 1),
 ('Coyotes', 1),
 ("Phillies'", 1),
 ('Aaron', 1)]

But I would like something like this :

{'COVID': 0.6666666666666666,
 'Colorado': 0.6296296296296297,
 'Denver': 0.6296296296296297,
 'Denver Bronco': 0.7407407407407407,
 'NFL': 0.48148148148148145,
 'New': 0.5925925925925926,
 'coronavirus': 0.7407407407407407,
 'snow': 1.0,
 'will': 0.4074074074074074,
 'year': 0.4074074074074074}
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
Poala
  • 5
  • 2

1 Answers1

2

You may divide each value by the total amount of words

from collections import Counter

text_combined = (
    "Boston Celtics end Denver Nuggets' eight-game winning"
    " run Curry sets Warriors all-timescoring record A"
    " Brooklyn Artist Wants Sports Fans to Wear Their"
    " Names Nikola Jokic, Nuggets beat Spurs for 7th win in"
    " row - Reuters Mikko Rantanen scores twice as Avalanche"
    " down Coyotes - Reuters Phillies' Aaron "
)
p = Counter(text_combined.split())
total = sum(p.values())
result = {k: v / total for k, v in p.items()}
print(result)

Giving

{'-': 0.0425531914893617, 'Reuters': 0.0425531914893617, 'Boston': 0.02127659574468085, ...}
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
azro
  • 53,056
  • 7
  • 34
  • 70