-3

How to eliminate all the email addresses from a given file.

sample file mail.txt :

Hello from me
how are you?
shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting 
sharp @2PM.okay see you  bye@yahoo.co.in olad-hola

Expected output:

Hello from me
how are you?
 to about the meeting 
sharp @2PM.okay see you olad-hola
NIMI
  • 73
  • 10

3 Answers3

1

Have a look into this question for a regular expression you can use for e-mail. Then you can use the re module from the standard library to replace all matches of that regular expression by an empty string.

Using the regular expression in the accepted answer of the question in the link we have

import re

with open("sample.txt") as f:
    content = f.read()

    pattern = r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"""
    replaced_content = re.sub(pattern, "<removed>", content) 

    print()
    print(replaced_content)

This prints the text

Hello from me
how are you?
<removed> to <removed> <removed> about the meeting 
sharp @2PM.okay see you  <removed> olad-hola
darcamo
  • 3,294
  • 1
  • 16
  • 27
  • Is there any other way of pattern searching? – NIMI Sep 23 '20 at 21:30
  • You need to read the whole file and you need a way to identify which substrings are considered an "email". I'm not sure if there is another way besides using regular expressions or doing the same thing a regular expression does by hand. – darcamo Sep 23 '20 at 21:37
1

You can use re.sub() from the re module, to replace the addresses with an empty string.

with open("mail.txt", "r") as f:
    text = f.read()

clean_text = re.sub(r"\S+@\S+", "", text)
print(clean_text)

This uses a simplified regular expression, which also matches invalid e-mail addresses. This removes everything, that looks like an e-mail address. Because you don't want to verify the correctness of the addresses, this is no problem.

Wups
  • 2,489
  • 1
  • 6
  • 17
0

you can do it like this:

a= 'Hello from me\
    how are you?\
    shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting\
    sharp @2PM.okay see you  bye@yahoo.co.in olad-hola'

for word in a.split():
    if '@' in word:
        a = a.replace(word, '').replace('    ', '\n')

print(a)
ewokx
  • 2,204
  • 3
  • 14
  • 27
Erol
  • 11
  • 1