This is the first part of a MapReduce problem I am working on. I need a function that yields 1 if the first letter of a word starts with a vowel, otherwise it should yield 0.
The program is run in the terminal by piping a text file to the mapper like so:
cat test.txt | python3 mapper.py
Here is what expected output should look like with a text file containing the string
its a beautiful life
:
i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
I was successful in completing the assignment for the output of the first two columns, but I am having trouble with the third column. It is supposed to yield 1 if the first letter of a word is a vowel, and 0 otherwise.
My current output looks like:
i 1 1 0
t 1 1 0
s 1 1 0
a 1 1 0
b 1 1 0
e 1 1 0
a 1 1 0
u 1 1 0
t 1 1 0
i 1 1 0
f 1 1 0
u 1 1 0
l 1 1 0
l 1 1 0
i 1 1 0
f 1 1 0
e 1 1 0
Here is the code I have written so far:
import sys
import re
pattern = re.compile("^[a-z]+$") # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])") # matches starting vowels
ending_vowels = re.compile("[aeiouAEIOU]$") # matches ending vowels
# starting_vowel_match = 0
ending_vowel_match = 0
def first_vowel():
for token in tokens:
if starting_vowels.match(token[0]):
yield '1'
else:
yield '0'
for line in sys.stdin:
line = line.strip() # removes leading and trailing whitespace
tokens = line.split() # splits words into list, needed for part 2
mashed_line = line.replace(" ","")
lower_mashed_line = mashed_line.lower()
for letter in lower_mashed_line:
if pattern.match(letter): # if pattern matches, prints 'word 1'
print('%s 1' % letter, next(first_vowel()), ending_vowel_match)