I'm trying to find a way to clean some very sloppy HTML (machine generated).
My assumption would be regex for this solution, but I'm not sure where to start.
HTML like...
the <div>government’s</div> “risk management” efforts. As <br />
<span style="line-height:1.6em">critical infrastructure provides</span><br>
to HTML like...
the government's "risk management" efforts. As critical infrastructure provides
This means replacing or removing several different tags...
= ' '
<br /> = ' '
<br> = ' '
“ = "
” = "
’ = '
<span> = REMOVE
<div> = REMOVE
style = REMOVE
I have several different text editors (Sublime Text, TextMate, etc.) and I'm open to using apps, applescript or anything else to save from having to manually search for each of these.
Thanks for any help.