1

Suppose my HTML looks like this:

html = '<HTML><BODY><a id="id1">test</a><a id="id2">test2</a></BODY></HTML>'

I extract the 2nd link: node = doc.css("a#id2")[0]

How do I get the starting index of this node HTML in the HTML source? Which is 32?

html.slice(32, SOMETHING) = '<a id="id2">...'

Note: I know this is a trivial example but the solution should address cases where the node I extract isn't unique in the HTML.

Henley
  • 21,258
  • 32
  • 119
  • 207
  • You can convert the doc to a String with .text and use this answer https://stackoverflow.com/a/3520277/2067375 to scan the string by a regular expresion and You will get the starting position of each match. – Rada Bogdan Jun 26 '17 at 20:17

0 Answers0