2

This is the first part of a MapReduce problem I am working on. I need a function that yields 1 if the first letter of a word starts with a vowel, otherwise it should yield 0.

The program is run in the terminal by piping a text file to the mapper like so:

cat test.txt | python3 mapper.py

Here is what expected output should look like with a text file containing the string its a beautiful life:

i 1 1 0   
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0  
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0 
f 1 0 0
e 1 0 1

I was successful in completing the assignment for the output of the first two columns, but I am having trouble with the third column. It is supposed to yield 1 if the first letter of a word is a vowel, and 0 otherwise.

My current output looks like:

i 1 1 0
t 1 1 0
s 1 1 0
a 1 1 0
b 1 1 0
e 1 1 0
a 1 1 0
u 1 1 0
t 1 1 0
i 1 1 0
f 1 1 0
u 1 1 0
l 1 1 0
l 1 1 0
i 1 1 0
f 1 1 0
e 1 1 0

Here is the code I have written so far:

import sys
import re
pattern = re.compile("^[a-z]+$") # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])") # matches starting vowels 
ending_vowels = re.compile("[aeiouAEIOU]$") # matches ending vowels
# starting_vowel_match = 0
ending_vowel_match = 0

def first_vowel():
    for token in tokens:
        if starting_vowels.match(token[0]):
            yield '1'
        else:
            yield '0'

for line in sys.stdin:
    line = line.strip() # removes leading and trailing whitespace
    tokens = line.split() # splits words into list, needed for part 2
    mashed_line = line.replace(" ","")
    lower_mashed_line = mashed_line.lower()
    for letter in lower_mashed_line: 
        if pattern.match(letter): # if pattern matches, prints 'word 1'
            print('%s 1' % letter, next(first_vowel()), ending_vowel_match)
Zaku
  • 180
  • 6

2 Answers2

1

Iterate over words and letters separately to find out which word each letter corresponds to, and then reset starting_vowel_match after first letter:

import sys
import re

pattern = re.compile("^[a-z]+$")  # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])")  # matches starting vowels
ending_vowels = re.compile("[aeiouAEIOU]$")  # matches ending vowels
starting_vowel_match = 0
ending_vowel_match = 0

for line in sys.stdin:
    line = line.strip()  # removes leading and trailing whitespace
    words = line.lower().split()  # splits the line into words and converts to lowercase
    for word in words:
        starting_vowel_match = 1 if starting_vowels.match(word[0]) else 0
        # ternary operator, word[0] is the first letter of the word

        for letter in word:
            if pattern.match(letter):
                print("%s 1" % letter, starting_vowel_match, ending_vowel_match)
                starting_vowel_match = 0 # reset starting vowel match after first letter

Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24
0

Just check the first letter of a word is a vowel within the loop because you are not properly iterating through the generator for each letter so lets just remove it.

import sys
import re

pattern = re.compile("^[a-z]+$")  # matches purely alphabetic words
starting_vowels = re.compile("^[aeiouAEIOU]")  # matches starting vowels

for line in sys.stdin:
    line = line.strip()  # removes leading and trailing whitespace
    tokens = line.split()  # splits words into list, needed for part 2
    lower_mashed_line = ''.join(tokens).lower()
    token_index = 0
    for letter in lower_mashed_line:
        if pattern.match(letter):
            is_first_vowel = 1 if starting_vowels.match(tokens[token_index][0]) else 0
            print('%s 1' % letter, is_first_vowel)
            token_index += 1
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
  • I am getting a list index out of range error. I am not sure why. It seems like `tokens[token_index][0] should be getting the first letter from each word and then token_index increments. – Zaku Apr 21 '23 at 06:13