I'm trying to write a small bash script that:
- -wget's an html file every [x] minutes from the web
- -uses some linux utility to find differences in the file between the last two updates
- -Uses sed to modify the lines on which new text was detected
The problem I am running into is that the HTML file uses in-line CSS to format a table, but the actual code for the page is stored on one long line.
Effectively I need a Linux utility that can scan through a single line of code, find every instance of text between each tags, and insert those instances on their own line. That should make scanning the text easier. Every tool I've tried searches on a per-line basis which can't do what I need since the entire code is stored on a single line.