How to find required word in novel in python?

Question

I have a text and I have got a task in python with reading module:

Find the names of people who are referred to as Mr. XXX. Save the result in a dictionary with the name as key and number of times it is used as value. For example:

If Mr. Churchill is in the novel, then include {'Churchill' : 2}
If Mr. Frank Churchill is in the novel, then include {'Frank Churchill' : 4}

The file is .txt and it contains around 10-15 paragraphs.

Do you have ideas about how can it be improved? (It gives me error after some words, I guess error happens due to the reason that one of the Mr. is at the end of the line.)

orig_text= open('emma.txt', encoding = 'UTF-8')
lines= orig_text.readlines()[32:16267]
counts = dict()
for line in lines:
    wordsdirty = line.split()
    try:
        print (wordsdirty[wordsdirty.index('Mr.') + 1])
    except ValueError:
        continue

If you are asking for people to improve your code, go check out [Code Review](https://codereview.stackexchange.com/). — Have a nice day, May 18 '21 at 16:33

score 0 · Answer 1 · answered May 18 '21 at 16:39

Try this:

text = "When did Mr. Churchill told Mr. James Brown about the fish"
m = [x[0] for x in re.findall('(Mr\.( [A-Z][a-z]*)+)', text)]

You get:

['Mr. Churchill', 'Mr. James Brown']

To solve the line issue simply read the entire file:

text = file.read()

Then, to count the occurrences, simply run:

Counter(m)

Finally, if you'd like to drop 'Mr. ' from all your dictionary entries, use x[0][4:] instead of x[0].

score 0 · Answer 2 · answered May 18 '21 at 16:40

0

This can be easily done using regex and capturing group.

Take a look here for reference, in this scenario you might want to do something like

# retrieve a list of strings that match your regex
matches = re.findall("Mr\. ([a-zA-Z]+)", your_entire_file)  # not sure about the regex

# then create a dictionary and count the occurrences of each match
# if you are allowed to use modules, this can be done using Counter
Counter(matches)

To access the entire file like that, you might want to map it to memory, take a look at this question

answered May 18 '21 at 16:40

ozerodb

543
3
13

1

You don't deal with names like Mr. Tom Smith – rudolfovic May 18 '21 at 16:45
@rudolfovic true, I didn't consider that scenario when writing my answer – ozerodb May 18 '21 at 16:49

How to find required word in novel in python?

2 Answers2