I want to remove all from, to, cc, subject sent tags from this text document and only keep the body of the mail so that I can use this to summarize content of the document. What is the best way to do this in python. I think it's better to first do the extraction and then use preprocessing for this case. Also attaching code here. So if anyone can suggest how to do this, would be really helpful. The payload and ismultipart part of the file is not done properly and there is where my doubt is and so have commented that part and require help there.
Attaching code and the .txt file below for reference.
import os, sys, csv
import glob
import re
import email
#from tika import parser
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
from gensim.summarization import summarize, keywords
# Set path to directory where files are
dirs = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
#os.chdir(dirs)
for filename in glob.glob(os.path.join(dirs, '*.txt')):
try:
for files in filename:
file = open(filename, 'r', encoding ='utf-8')
filecontents = file.read()
filecontents = re.sub(r'\s+', ' ', filecontents)
print(filecontents)
filecontents = filecontents.strip('\n')
b = email.message_from_string(filecontents)# NEED
if b.is_multipart():#HELP
for payload in b.get_payload():#HERE
# if payload.is_multipart(): ...#SO
print (payload.get_payload())#COMMENTED
else:#
print (b.get_payload())#
summary = summarize(filecontents, ratio =0.10)
print(summary)
kw = keywords(filecontents, words=15)
print(kw)
break
#writer.writerow([file, summary, kw])
except Exception as e:
pass
TEXT FILE
Stephanie /ANN
From: Mr.A, <.Mr.A@abc.com>
Sent: Wednesday, July 25, 2018 2:27 PM
To: , Tim /ANN; Abd, May /ANN
Cc: Mr.A, ; Theoder Jerry,
Subject: [EXTERNAL] RE: Holdings: XXXX SPA – mfno.1322
Dear Dr. Tim A. ,
The option-2 is fine. By the way, we had received in the past Letter of Authorization for many companies other
than Spa and I guess Xxxx does not do bANNiness with them either. If yes, then need to submit withdrawal
of Letter of Authorization for those companies and send a Letter of Authorization for spa. stating for any
applications submitted. We will send an administrative filing issue letter for both the holder and the agent.
Thank you!
Regards,
Mr.A
PRODUCT Master File
CDER
Currently, there is no requirement to submit or resubmit NAs in any electronic format. However, starting May 5, 2018,
new NAs, as well as any submissions to the existing NAs mANNt be submitted electronically in legal (electronic Common
Technical Document) format specified by GROUP A in the legal guidance. NA submissions that are not submitted in legal
format after this date may be subject to rejection. For more information please check the NA website
www.GROUP A.gov/abc/bca
This communication is an informal communication consistent with which represents my best judgment
at this time, but does not constitute an advisory opinion, does not necessarily represent the formal position of the
GROUP A, and does not bind or otherwise obligate or commit the agency to the views expressed. This communication,
including any attachments, is intended only for the person or entity to which it is addressed and may contain
confidential material. Any review, retransmission, distribution or other ANNe of this information by persons or entities
other than the intended recipient is prohibited. If you received this in error, please destroy any copies, contact the
sender and delete the material from any computer. Thank you.
From: Tim.@xxxx.com [mailto:Tim.@xxxx.com]
Sent: Wednesday, July 25, 2018 2:10 PM
To: Mr.A, <.Mr.A@abc.com>
Cc: May.Abd@xxxx.com
Subject: RE: Holdings: XXXX SPA ‐ dm 013383
Dear ,
XXXX
2
Thanks for your phone call to clarify your needs and to understand the situation. I have confirmed that Xxxx only does
direct bANNiness for test S intermediate with b. and not with the other companies (e,
x, etc.) that are secondary companies. Based on our discANNsion, I believe that we do not need to
provide QAs for these secondary companies or mention them in our NA file as they would be covered under a
separate QA S.p.A. to them. If this is correct, then I believe you mentioned that we have two options as
described below:
Option 1: We can issue a separate QA for each . NA to be specific on which NA is being cross‐referenced
to our NA 13383.
Option 2: We can do a single QA for and mention that they can cross‐reference any of their NAs. This
would allow them to cross‐reference any of their
If I have misunderstood or am incorrect in my response and we need to discANNs further, please let me know.
If not, when you issue your request, can you please send to me and May Abd by email?
Kind regards.
Tim
Tim A. , BsC
Director, YY SERVICES)
Xxxx ANN
Phone/FAX: 2312333
Cell: 23312123131
Email: tim.@xxxx.com
From: , Tim /ANN
Sent: Monday, July 23, 2018 7:05 AM
To: 'Mr.A, '
Cc: Abd, May /ANN
Subject: RE: [EXTERNAL] Holder: XXXX SPA - NA 013383
Dear ,
May is now on vacation and I am covering for her during her absence. Is there a good time to call you today or later this
week? Please let me know and we can schedule or please call my cell phone 21313131231 at your convenience.
Kind regards.
Tim
Tim A. , MSC
Director, PQR
Xxxx
Phone/FAX: 2312313313
Cell: 3142342424
Email: tim.@xxxx.com
XXXX
3
‐‐‐‐‐‐‐‐‐‐ Forwarded message ‐‐‐‐‐‐‐‐‐‐
From: "Mr.A, " <.Mr.A@abc.com>
Date: Jul 20, 2018 9:01 AM
Subject: [EXTERNAL] Holder: XXXX SPA ‐ NA 013383
To: "TRETE/ANN" <May.Abd@xxxx.com>
Cc: "mno.com>
Dear May Abd,
. I need to talk to you on this.
Thank you!
Regards,
Mr.A
PRODUCT Master File
CDER
Currently, there is no requirement to submit or resubmit NAs in any electronic format.
format after this date may be subject to rejection. For more information please check the NA website
www.GROUP A./cder/NA
This communication is an informal communication which represents my best judgment
at this time, but does not constitute an advisory opinion, does not necessarily represent the formal position of the
GROUP A, and does not bind or otherwise obligate or commit the agency to the views expressed. This communication,
including any attachments, is intended only for the person or entity to which it is addressed and may contain
confidential material. Any review, retransmission, distribution or other ANNe of this information by persons or entities
other than the intended recipient is prohibited. If you received this in error, please destroy any copies, contact the
sender and delete the material from any computer. Thank you.
XXXX