How do I count the number of words spoken by each character in a dialogue and store the count in a dictionary?

Question

I'm trying to count the number of words spoken by the characters "Michael" and "Jim" in the following dialogue and store them in a dictionary that looks like like {"Michael:":15, "Jim:":10}.

string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

I thought of creating an empty dictionary containing the character names as keys, splitting the string by " " and then counting the number of resulting list elements between the the character names by using the keys as a reference and then storing the count of words as values. This is the code I've used so far:

dict = {"Michael:" : 0,
        "Jim:" : 0}

list = string.split(" ")

indices = [i for i, x in enumerate(list) if x in dict.keys()]
nums = []
for i in range(1,len(indices)):
    nums.append(indices[i] - indices[i-1])
print(nums)

The result is a list that prints as [15, 10, 15, 9]

I think I need help with the following:

A better approach if possible
A way to count the number of words spoken by a character when that line is the last line of the dialogue
A way to update the dict with an automatic count of words spoken by the character

The last point is crucial because I'm trying to replicate this process for an episode's worth of quotes.

Thank you in advance!

what @Sujay means is that `string` is a std library module, so you make it unavailable by using it as a variable name (yes, you could `import string as still_available_string`). — JL Peyret, Jul 02 '21 at 03:40
right, dinna notice since I only used the OPs string definition. — JL Peyret, Jul 02 '21 at 03:43
Thanks for your help, guys. The usage of inbuilt functions was a one-time mistake, but thanks for pointing that out as well. — beginnerprogrammerforever, Jul 03 '21 at 01:25
@beginnerprogrammerforever well... accepting an answer or upvoting the ones you found helpful is the usual manner of thanking people here. — JL Peyret, Jul 03 '21 at 02:07

score 1 · Accepted Answer · answered Jul 02 '21 at 01:51

1

Loop through the words, incrementing the appropriate counts as you go.

dialogue_dict = {"Michael:" : 0, "Jim:" : 0}

words = string.split(" ")
current_character = None
for word in words:
    if word in dialogue_dict:
        current_character = word
    elif current_character:
        dialogue_dict[current_character] += 1

BTW, don't use list and dict as variable names, that overwrites the built-in functions with those names.

answered Jul 02 '21 at 01:51

Barmar

741,623
53
500
612

Thanks, Barmar. I had some follow up questions to make sure I understand this clearly - 1. Why didn't you use ``` if word in dialogue_dict.keys(): ``` ? Shouldn't we just be looking at the keys? – beginnerprogrammerforever Jul 03 '21 at 01:32
1

When a dict is used as an iterable, it just returns the keys. So `in dialogue_dict` is the same as `in dialogue_dict.keys()`. – Barmar Jul 03 '21 at 17:03
1

You can see the same thing when you do `for key in dialogue_dict:` – Barmar Jul 03 '21 at 17:04

JL Peyret · Answer 2 · 2021-07-02T03:37:19.847

Use a regex to split by character names, keeping the character separators,
then iterate on the character/line pairs using chunks of 2.
- use a collections.defaultdict(int) to automatically add a new character at 0 and add the word split for the current line,

string_ = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

import re
from collections import defaultdict

#This assumes a character name has no blanks and is followed by a `:`
pat = re.compile("([A-Z][a-z'-]+:)")

#splitting like returns the delimeters (characters) as well
li = [v for v in pat.split(string_) if v]

# split 2 by 2
def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

#use a defaultdict to start new characters at 0
#collections.Counter could also work
counter = defaultdict(int)

pairs = chunks(li,2)
for character, line in pairs:
    counter[character.rstrip(":")] += len(line.split())
 
print(f"{counter=}")

output:

counter=defaultdict(<class 'int'>, {'Michael': 38, 'Jim': 17})

score 1 · Answer 3 · answered Jul 02 '21 at 03:40

we can do this using regex.without provide speaker name

import re

string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
dialog_count = {}

#extract speakers using regex
speakers = re.findall(r'\w+:',string)
#split sentences using regex
sentencs = re.split(r'\w+:',string)
speakers = filter(lambda x: x.strip()!='' ,speakers)
sentencs = filter(lambda x: x.strip()!='' ,sentencs)

#remap each speaker to it's sentence
dialogs = zip(list(speakers),list(sentencs))

#count total words
for speaker,dialog in dialogs:
    dialog = dialog.split(" ")
    dialog = list(filter(lambda x: x.strip()!='',dialog))
    dialog_count[speaker] = dialog_count.get(speaker,0) + len(dialog)
print(dialog_count)

{'Michael:': 38, 'Jim:': 17}

How do I count the number of words spoken by each character in a dialogue and store the count in a dictionary?

3 Answers3

output: