2

I'm trying to count the number of words spoken by the characters "Michael" and "Jim" in the following dialogue and store them in a dictionary that looks like like {"Michael:":15, "Jim:":10}.

string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

I thought of creating an empty dictionary containing the character names as keys, splitting the string by " " and then counting the number of resulting list elements between the the character names by using the keys as a reference and then storing the count of words as values. This is the code I've used so far:

dict = {"Michael:" : 0,
        "Jim:" : 0}

list = string.split(" ")

indices = [i for i, x in enumerate(list) if x in dict.keys()]
nums = []
for i in range(1,len(indices)):
    nums.append(indices[i] - indices[i-1])
print(nums)

The result is a list that prints as [15, 10, 15, 9]

I think I need help with the following:

  1. A better approach if possible
  2. A way to count the number of words spoken by a character when that line is the last line of the dialogue
  3. A way to update the dict with an automatic count of words spoken by the character

The last point is crucial because I'm trying to replicate this process for an episode's worth of quotes.

Thank you in advance!

3 Answers3

1

Loop through the words, incrementing the appropriate counts as you go.

dialogue_dict = {"Michael:" : 0, "Jim:" : 0}

words = string.split(" ")
current_character = None
for word in words:
    if word in dialogue_dict:
        current_character = word
    elif current_character:
        dialogue_dict[current_character] += 1

BTW, don't use list and dict as variable names, that overwrites the built-in functions with those names.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks, Barmar. I had some follow up questions to make sure I understand this clearly - 1. Why didn't you use ``` if word in dialogue_dict.keys(): ``` ? Shouldn't we just be looking at the keys? – beginnerprogrammerforever Jul 03 '21 at 01:32
  • 1
    When a dict is used as an iterable, it just returns the keys. So `in dialogue_dict` is the same as `in dialogue_dict.keys()`. – Barmar Jul 03 '21 at 17:03
  • 1
    You can see the same thing when you do `for key in dialogue_dict:` – Barmar Jul 03 '21 at 17:04
1
string_ = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

import re
from collections import defaultdict

#This assumes a character name has no blanks and is followed by a `:`
pat = re.compile("([A-Z][a-z'-]+:)")

#splitting like returns the delimeters (characters) as well
li = [v for v in pat.split(string_) if v]

# split 2 by 2
def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

#use a defaultdict to start new characters at 0
#collections.Counter could also work
counter = defaultdict(int)

pairs = chunks(li,2)
for character, line in pairs:
    counter[character.rstrip(":")] += len(line.split())
 
print(f"{counter=}")

output:

counter=defaultdict(<class 'int'>, {'Michael': 38, 'Jim': 17})
JL Peyret
  • 10,917
  • 2
  • 54
  • 73
1

we can do this using regex.without provide speaker name

import re

string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
dialog_count = {}

#extract speakers using regex
speakers = re.findall(r'\w+:',string)
#split sentences using regex
sentencs = re.split(r'\w+:',string)
speakers = filter(lambda x: x.strip()!='' ,speakers)
sentencs = filter(lambda x: x.strip()!='' ,sentencs)

#remap each speaker to it's sentence
dialogs = zip(list(speakers),list(sentencs))

#count total words
for speaker,dialog in dialogs:
    dialog = dialog.split(" ")
    dialog = list(filter(lambda x: x.strip()!='',dialog))
    dialog_count[speaker] = dialog_count.get(speaker,0) + len(dialog)
print(dialog_count)

{'Michael:': 38, 'Jim:': 17}
nay
  • 1,725
  • 1
  • 11
  • 11