There is some text I need from a web page, a page whose length changes somewhat from day to day. I am looking to download that text periodically. I do not want/need several dozen lines from both the beginning and end of the roughly 250 line page. The total number of lines on the page will be unpredictable, so I will be needing to establish beginning/end points for the deletion I wish to perform based on bits of text that do not change from day to day. I've identified the target text patterns, so I'm looking to parse the content based on those such that the unwanted lines get deleted in the resulting document. I'm wanting to use command line utilites for this since I would like to automate the process and make a cron job out of it.
The download method of choice is to use lynx -dump www.specified.url my-download.txt
That part is working fine. But processing the dump so as to cut off the unwanted beginning and ending lines is so far not working. I found a sed example that, it seems, should do what I need:
sed -n '/Phrase toward the beginning/,/Phrase toward the end/p' file_to_parse.txt >parsed_file.txt
It works partially, meaning it cuts off the file's beginning at the right point (all lines preceding "Phrase toward the beginning"). But I cannot seem to make it cut lines from the end, i.e., lines following the phrase "Phrase toward the end." All my attempts using this formula have so far not touched the end of the file at all. I should probably mention that most of the lines in the dumped file lynx produces begin, for whatever reason, with 3 blank spaces--including the "Phrase toward the end" line I'm trying to specify as the point after which further lines should be deleted.
I assume there may be more than one utility that can do the sort of parsing I'm after--sed and awk are the likely candidates I can think of. I tend to gravitate toward sed since its workings are slightly less mysterious to me than are awk's. But truth be told, I really only have the vaguest of conceptions as to how to use sed. When it comes to using and/or understanding awk, I get lost very, very quickly. Perhaps there are other utilities that can, based on textual patterns, lop off portions of the beginning and ending of a text file?
Input on how I might use sed, awk--or any other similar utility--to accomplish my goal, will be appreciated. This is to be done on an Ubuntu machine, btw.
LATER EDIT: sorry for not having posted and example. The downloaded page will look something like the following
Unwanted line 1
Unwanted line 2
Unwanted line 3
Unwanted line etc
Phrase toward the beginning
Wanted line 1
Wanted line 2
Wanted line 3
Wanted line ca 4-198
Phrase toward the end
Unwanted line 200
Unwanted line 201
Unwanted line 202
Unwanted line . . . (to end of file)
The final output should look, on the other hand, like
Phrase toward the beginning
Wanted line 1
Wanted line 2
Wanted line 3
Wanted line ca 4-198
Phrase toward the end
I hope things will be clearer now. Please do bear in mind, though I've used line numbers to help better illustrate what I'm aiming to do, that I will be unable to do the desired deletions based on line numbers owing to the unpredictable ways in which the page I'm downloading will be changing.