I have a .txt file that contains multiple newspaper articles. Each article has a headline, the author name etc. I want to read the whole .txt file in R and remove every line + the next 5 lines that starts with certain words. I think gsub + reg
expression might be the solution, but I do not know how to define it like the way so that not only the line containing these words is deleted, but also the next 5 lines.
Edit:
The txt. file consists of 200 Washington Post articles. Each article ends with:
lydia.depillis@washpost.com
LOAD-DATE: July 14, 2013
LANGUAGE: ENGLISH
PUBLICATION-TYPE: Web Publication
Copyright 2013 Washingtonpost.Newsweek Interactive Company, LLC d/b/a Washington
Post Digital
All Rights Reserved
4 of 200 DOCUMENTS
Washington Post Blogs
In the Loop
June 28, 2013 Friday 3:08 PM EST
Whenever an e-mail address appears, I want to delete everything until the line where a date appears so that we have a smooth transition to the next article. I want to use a sentiment analysis and thus don't need these lines.