I have to check column in csv to find valid emails and keep them while removing invalid data from that column. I already have an AWK command with simple regex but some of the invalid emails are not filtered with that. Below is that command
awk 'BEGIN{FS=OFS=","}{$1=match($1,/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}/)?substr($1,RSTART,RLENGTH):"";print}'
But I want to replace this regex pattern with RFC 5322 compliant regex. I found following regex but it doesn't work when I add it to above awk command. How can I insert this regex pattern to above AWK command?
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Csv sample is below
-pedja-@mail.ru,abd
0.5maratonac@gmail.com,534
00dovla.@gmail.com,5rfrf
015.josa@gmail.com,54rf
02142..6584@nadlanu.com,54r4
0616080668.boki@gmail.com,5443
0@0..com,344545
.100.three.7@gmail.com,64
10867249ld@emailgg.xyz,54444
I tried below command
awk 'BEGIN{FS=OFS=","}{$1=match($1,/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}/)?substr($1,RSTART,RLENGTH):"";print}'
Expetected output
-pedja-@mail.ru,abd
0.5maratonac@gmail.com,534
,5rfrf
015.josa@gmail.com,54rf
,54r4
0616080668.boki@gmail.com,5443
,344545
,64
10867249ld@emailgg.xyz,54444
john@,4355
(00dovla.@gmail.com,02142..6584@nadlanu.com,0@0..com,.100.three.7@gmail.com,john@) are not valid emails and they are removed)