My requirement:
I have a 20GB txt file which is tab-delimited.I want to use PERL/AWK(or grep) to see if the email address in the 'nth column' is valid one or not.(Regex --->/^(\w|-|_|.)+\@((\w|-|_)+.)+[a-zA-Z]{2,}$/ should be ok, but no consecutuve '..' OR'underscores' eg: abc..cd@xyz.com should be invalid, also abc__cd@xyz.com should be invalid as well).If the email address is valid redirect it to valid_email.txt if invalid redirect it to invalid_email.txt.The emphasis is to catch all invalid email address - with better performance- as the file size will grow further at a future date.
Edit/Update:
Does the below piece of code do - which can catch atleast 99% of invalid email address formats?OR does it need any further modification? Kindly feel free to post your opinons and suggestions.
To pull out Valid Email ID
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}\b" Raw_file.txt >Valid_Email_List.txt (where Rawfile.txt contains only email addresses)