I recently discovered some flaws with my users. Some of the emails registered had some characters with different encodings others than UTF-8. So I'm trying to clean all those emails with gsub. By now I'm trying to capture all records with flaws using this regex. Explanation abou the regex: http://regexr.com/3bati
/\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/
But I'm not able to capture the following string which I inserted in the database as a flag
"\u200btest@example.com".encode('utf-8')
How can I improve this regex to improve my validation and do not let encodings ruin my login?