How to return 1 if the first letter of a word is a vowel, return 0 otherwise. Mapper (MapReduce) problem

Question

This is the first part of a MapReduce problem I am working on. I need a function that yields 1 if the first letter of a word starts with a vowel, otherwise it should yield 0.

The program is run in the terminal by piping a text file to the mapper like so:

cat test.txt | python3 mapper.py

Here is what expected output should look like with a text file containing the string its a beautiful life:

I was successful in completing the assignment for the output of the first two columns, but I am having trouble with the third column. It is supposed to yield 1 if the first letter of a word is a vowel, and 0 otherwise.

My current output looks like:

Here is the code I have written so far:

import sys
import re
pattern = re.compile("^[a-z]+$") # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])") # matches starting vowels 
ending_vowels = re.compile("[aeiouAEIOU]$") # matches ending vowels
# starting_vowel_match = 0
ending_vowel_match = 0

def first_vowel():
    for token in tokens:
        if starting_vowels.match(token[0]):
            yield '1'
        else:
            yield '0'

for line in sys.stdin:
    line = line.strip() # removes leading and trailing whitespace
    tokens = line.split() # splits words into list, needed for part 2
    mashed_line = line.replace(" ","")
    lower_mashed_line = mashed_line.lower()
    for letter in lower_mashed_line: 
        if pattern.match(letter): # if pattern matches, prints 'word 1'
            print('%s 1' % letter, next(first_vowel()), ending_vowel_match)

score 1 · Answer 1 · edited Apr 24 '23 at 04:06

1

Iterate over words and letters separately to find out which word each letter corresponds to, and then reset starting_vowel_match after first letter:

import sys
import re

pattern = re.compile("^[a-z]+$")  # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])")  # matches starting vowels
ending_vowels = re.compile("[aeiouAEIOU]$")  # matches ending vowels
starting_vowel_match = 0
ending_vowel_match = 0

for line in sys.stdin:
    line = line.strip()  # removes leading and trailing whitespace
    words = line.lower().split()  # splits the line into words and converts to lowercase
    for word in words:
        starting_vowel_match = 1 if starting_vowels.match(word[0]) else 0
        # ternary operator, word[0] is the first letter of the word

        for letter in word:
            if pattern.match(letter):
                print("%s 1" % letter, starting_vowel_match, ending_vowel_match)
                starting_vowel_match = 0 # reset starting vowel match after first letter

edited Apr 24 '23 at 04:06

Kyle F Hartzenberg

2,567
3
6
24

answered Apr 21 '23 at 03:25

Faku Venturi

21
4

For some reason t and s are returning a 1 when it should be returning a 0. – Zaku Apr 21 '23 at 06:02
1

Because 't' and 's' are part of "its", so the first letter of the word is a vowel. Do you need to only return 1 if its vowel AND the first letter of the word? – Faku Venturi Apr 21 '23 at 06:14
Faku Venturi yes. – Zaku Apr 21 '23 at 06:16
1

@Zaku Edited, reset starting vowel match after first letter – Faku Venturi Apr 21 '23 at 06:22

score 0 · Answer 2 · answered Apr 21 '23 at 03:01

Just check the first letter of a word is a vowel within the loop because you are not properly iterating through the generator for each letter so lets just remove it.

import sys
import re

pattern = re.compile("^[a-z]+$")  # matches purely alphabetic words
starting_vowels = re.compile("^[aeiouAEIOU]")  # matches starting vowels

for line in sys.stdin:
    line = line.strip()  # removes leading and trailing whitespace
    tokens = line.split()  # splits words into list, needed for part 2
    lower_mashed_line = ''.join(tokens).lower()
    token_index = 0
    for letter in lower_mashed_line:
        if pattern.match(letter):
            is_first_vowel = 1 if starting_vowels.match(tokens[token_index][0]) else 0
            print('%s 1' % letter, is_first_vowel)
            token_index += 1

I am getting a list index out of range error. I am not sure why. It seems like `tokens[token_index][0] should be getting the first letter from each word and then token_index increments. — Zaku, Apr 21 '23 at 06:13

How to return 1 if the first letter of a word is a vowel, return 0 otherwise. Mapper (MapReduce) problem

2 Answers2

Linked