1

This is from my previous post here:

How to return 1 only if the last letter of a word is a vowel? Return 0 otherwise

Here is the code I am using:

import sys
import re

pattern = re.compile("^[a-z]+$")  # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])")  # matches starting vowels
ending_vowels = re.compile("[aeiouAEIOU]$")  # matches ending vowels
starting_vowel_match = 0
ending_vowel_match = 0

for line in sys.stdin:
    line = line.strip()  # removes leading and trailing whitespace
    words = line.lower().split()  # splits the line into words and converts to lowercase
    for word in words:
        if len(word) == 1:
            print(word[0], 1, *((1, 1) if word[0] in 'aeiou' else (0, 0))) # * unpacks startVowel 1 endVowel 1 if word[0] is a vowel
        else:
            print(word[0], 1, 1 if word[0] in 'aeiou' else 0, 0) 
            print(*(f'{letter} 1 0 0' for letter in word[1: -1]), sep='\n')
            print(word[-1], 1, 0, 1 if word[-1] in 'aeiou' else 0)

I want this to only print if a character is an alphabet, so an example output I would like is this for a text file containing the string "It's a beautiful life":

i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1

I am currently seeing this:

i 1 1 0
' 1 0 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1

I am wondering how to get rid of special characters in the output. I have tried a couple things including adding

        for letter in word:
            if pattern.match(letter):

in the for letter in word" block, but it is not returning the output I want.

ndc85430
  • 1,395
  • 3
  • 11
  • 17
Zaku
  • 180
  • 6

2 Answers2

2

Not sure why the original code does some work with re as it's never used.

When analysing a word of more than 1 letter, you need to consider all characters in the [1:-1] split individually.

Something like this:

import sys
from string import ascii_lowercase as LOWER

VOWELS = set('aeiou')

def isvowel(c):
    return int(c in VOWELS)

for line in sys.stdin:
    for word in line.strip().lower().split():
        if len(word) == 1:
            print(word, '1 1', isvowel(word[0]))
        else:
            print(word[0], 1, isvowel(word[0]), 0)
            for letter in word[1:-1]:
                if letter in LOWER:
                    print(f'{letter} 1 0 0')
            print(word[-1], '1 0', isvowel(word[-1]))

Output:

i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • the original code had re because it was part some boilerplate code given to me for a mapreduce problem I am working on. It might be needed later. – Zaku Apr 28 '23 at 00:01
-1

So you want to split a string into words and every word into alphabetical letters. For each letter you want wo print:

[letter] [starting_vowel_match] [letter_vowel_match] [ending_vowel_match]

Here would be my approach to this problem:

import re

test = "It's a beautiful life"

for line in test.split("\n"):
    line = line.strip()  # removes leading and trailing whitespace
    words = line.lower().split()  # splits the line into words and converts to lowercase
    for word in words:
        for letter in re.sub(r'[^a-zA-Z0-9]', '', word):
            print(
                letter, 
                1 if word[0] in 'aeiou' else 0,
                1 if letter in 'aeiou' else 0, 
                1 if word[-1] in 'aeiou' else 0)

The result looks different than your example output but I expected the first row to contain the starting_vowel_match!

i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 0 0 0
e 0 1 0
a 0 1 0
u 0 1 0
t 0 0 0
i 0 1 0
f 0 0 0
u 0 1 0
l 0 0 0
l 0 0 1
i 0 1 1
f 0 0 1
e 0 1 1
DTNGNR
  • 166
  • 1
  • 7