I'm trying to turn telly into tely.
I've tried
awk 'BEGIN {f="telly" ;print gensub(/(.)\\1/,"\\1","g",f)}'
and
awk 'BEGIN {f="telly" ;print gensub(/(.)\1/,"\\1","g",f)}'
but getting telly still
I'm pretty sure I can do this* (*backreferences in the match expression) in sed probably perl too. But I'm writing functions in awk as it makes processing multi-column data simpler than hacking out the columns in sed
for example I am doing different processes on a lexicon I'm working with
here is an example of some failed output. the third column of connoisseur should not have double s or n.
otto ottô ottô o-tt--ô ottô 11025
hindu hindü hindö hind--ü hndü 11250
wearily weárílý weérélê weáríl--ý wrlý 11251
nora nørá nøré nør--á nrá 11252
formulate før#mûlâtè fømûlât før#mûlât--è fr#mltè 11253
embryo embrýô embrêô e-mbr--ýô embrýô 11254
stylish stŷliŝħ stîliŝ stŷliŝ--ħ stlŝħ 11255
eruption ėrupţìòn irupŝn ė-rupţìòn ėrpţn 11256
authoritarian auπħorítã#rïán auπoréte#rêén au-πħorítã#rïán auπrt#rn 11258
untouched untóùĉħèð untéĉt u-ntóùĉħèð untĉð 11425
penry penrý penrê penr--ý pnrý 11625
maze mâzè mâz mâz--è mzè 11725
forge før#ĝè føj før#ĝ--è fr#ĝè 11825
ferrari fèŕrārï fŕrārê fèŕrār--ï frrï 12511
assailant ássâìlánt éssâlént á-ssâìlánt ásslnt 25011
corrosive còŕr0ôsivè cŕôsiv còŕr0ôsiv--è cr0svè 25111
daimler dâìmlèŕ dâmlŕ dâìml--èŕ dmlèŕ 25311
connoisseur connoíssèùŕ connoéssŕ connoíss--èùŕ cnnssèùŕ 25511
airframe ãìŕfrâmè eŕfrâm ãìŕ-frâm--è ãìŕfrmè 25911
ampersand ampèŕsand ampŕsand a-mpèŕsand ampsnd 62511
the input is 3 or four columns per line and I want to process it field by field rather than line by line. Hence the use of awk.
just for info here is a tiny snippet of the input
,"accepted","acçeptėd","1118"
,"ellis","ellis","7111"
,"woollen","wōòllén","11111"
,"hurricane","hurrícânè","11113"
,"fuelled","fûéllèd","11114"
,"groom","gröòm","11132"
,"preferring","prėfèŕriñg0","11134"
,"uttered","uttèŕèd","11138"
,"surrendered","sùŕr0endèŕèd","11141"
,"differentiate","différenţïâtè","11145"
,"exceeding","ėxc0êèdiñg0","11146"
,"groove","gröòvè","11148"
,"floppy","floppý","11163"
,"butterflies","buttèŕflîèś","11165"
,"ee","êè","11167"
,"cartoon","cār#töòn","11170"
,"slapped","slappèð","11172"
,"scattering","scattériñg0","11178"
,"jubilee","jübílêè","11179"
,"buzzing","buzziñg0","16111"
,"whipping","wħippiñg0","19111"
,"missus","missμś","21111"
,"corrosive","còŕr0ôsivè","25111"
,"alluring","állūriñg0","31110"
,"confidentially","confídenţìállý","34111"
,"antenna","antenná","35111"
,"whoosh","wħöòŝħ","41114"
,"fattened","fatténèd","49111"
,"cobble","cobblè","61116"
here is the final lines in the awk file I'm using. It uses functions directly on the fields this is why I am using awk. The third column in the output has a disambiguate function. I had put a gensub in that function that I was trying to use to 'singl-ify' the double letters with.
some code with functions in . . .
BEGIN {FS= "\"" }
{print $2,$4,disambiguate($4),isolate_terminal_vowels($4),devowelCentre(isolate_terminal_vowels($4)),$6}
thx