I want to count the number of two character vowel permutations that are contained within a list of 5-letter words. Vowel permutations are like 'aa','ae','ai,...,'ui','uo','uu'
.
I successfully get this done using apply()
but it is slow. I want to see if there is a fast vectorized way to get this done. I can't think of one.
Here's what I did:
import pandas as pd
import itertools
vowels = list('aeiou')
vowel_perm = [x[0]+x[1] for x in itertools.product(vowels,vowels)]
def wide_contains(x):
return pd.Series(data=[c in x for c in vowel_perm], index=vowel_perm)
dfwd['word'].apply(wide_contains).sum()
aa 1
ae 2
ai 12
ao 2
au 8
ea 15
ee 15
ei 1
eo 5
eu 2
ia 7
ie 10
ii 0
io 3
iu 0
oa 2
oe 2
oi 3
oo 11
ou 7
ua 2
ue 9
ui 2
uo 0
uu 0
The above is the expected output using the following data
word_lst = ['gaize', 'musie', 'dauts', 'orgue', 'tough', 'medio', 'roars', 'leath', 'quire', 'kaons', 'iatry', 'tuath', 'tarea', 'hairs', 'sloid',
'beode', 'fours', 'belie', 'qaids', 'cobia', 'cokie', 'wreat', 'spoom', 'soaps', 'usque', 'frees', 'rials', 'youve', 'dreed', 'feute',
'saugh', 'esque', 'revue', 'noels', 'seism', 'sneer', 'geode', 'vicua', 'maids', 'fiord', 'bread', 'squet', 'goers', 'sneap', 'teuch',
'arcae', 'roosa', 'spues', 'could', 'tweeg', 'coiny', 'cread', 'airns', 'gauds', 'aview', 'mudee', 'vario', 'spaid', 'pooka', 'bauge',
'beano', 'snies', 'boose', 'holia', 'doums', 'goopy', 'feaze', 'kneel', 'gains', 'acoin', 'crood', 'juise', 'gluey', 'zowie', 'biali',
'leads', 'twaes', 'fogie', 'wreak', 'keech', 'bairn', 'spies', 'ghoom', 'foody', 'jails', 'waird', 'iambs', 'woold', 'belue', 'bisie',
'hauls', 'deans', 'eaten', 'aurar', 'anour', 'utees', 'sayee', 'droob', 'gagee', 'roleo', 'burao', 'tains', 'daubs', 'geeky', 'civie',
'scoop', 'sidia', 'tuque', 'fairy', 'taata', 'eater', 'beele', 'obeah', 'feeds', 'feods', 'absee', 'meous', 'cream', 'beefy', 'nauch']
dfwd = pd.DataFrame(word_lst, columns=['word'])