Regex for Parsing HTML NewLine

Question

<tr><th>Biography</th>
<td>   A bunch of random info here   <td>

I am trying to get all content after the biography line and the opening tag of the next line. If the new line character not in brackets: (?<=Biography\n).{1,50} or if it is: (?<=Biography[\n]).{1,50} I am not sure which would get all characters starting at the next line. But They both are returning nothing. What is the correct way to read the newline character in a string of HTML data?

http://blog.codinghorror.com/parsing-html-the-cthulhu-way/ – Quentin Oct 11 '15 at 20:30 — Quentin, Oct 11 '15 at 20:30

score 2 · Answer 1 · answered Oct 11 '15 at 20:33

2

Never parse HTML with regex !

A solution using a proper parser :

$ saxon-lint --html --xpath '//*[.="Biography"]/../td/text()' file
A bunch of random info here

Check https://github.com/sputnick-dev/saxon-lint

answered Oct 11 '15 at 20:33

Gilles Quénot

173,512
41
224
223

Regex for Parsing HTML NewLine

1 Answers1