0
<tr><th>Biography</th>
<td>   A bunch of random info here   <td>

I am trying to get all content after the biography line and the opening tag of the next line. If the new line character not in brackets: (?<=Biography\n).{1,50} or if it is: (?<=Biography[\n]).{1,50} I am not sure which would get all characters starting at the next line. But They both are returning nothing. What is the correct way to read the newline character in a string of HTML data?

dachizzle37
  • 49
  • 1
  • 9

1 Answers1

2

Never parse HTML with regex !

A solution using a proper parser :

$ saxon-lint --html --xpath '//*[.="Biography"]/../td/text()' file
A bunch of random info here 

Check https://github.com/sputnick-dev/saxon-lint

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223