0

I am currently working on an AIR app and I'm trying to get a certain block of text from a website where that block of text is always between two specific strings that contain links that change from page to page.

It looks something like this:

<p><a href="**changes**">Previous Chapter</a> <a href="**changes**"><span style="float: right">Next Chapter</span></a></p>
.
.
_desired content_
.
.
<p><a href="**changes**">Previous Chapter</a> <a href="**changes**"><span style="float: right">Next Chapter</span></a></p>

*The two strings are identical

Now, I have tried several RegEx expressions but without success. I just can't get my head around Regex in general...

The last expression I've tried is: /(?<=<p><a href=\".+\">Previous Chapter<\/a> <span style=\"float: right\"><a href=\".+\">Next Chapter<\/a><\/span><\/p>)(.*)(?=<p><a href=\".+\">Previous Chapter<\/a> <span style=\"float: right\"><a href=\".+\">Next Chapter<\/a><\/span><\/p>)/gsi
but that one isn't even being recognized as a RegEx.

I would really appreciate any help with the subject.

Thanks in advance!

EDIT:

Thanks to Organis's help I managed to solve the problem, it was indeed easier and better NOT using RegEx. This is what i ended up doing:

text=text.split("Next Chapter<\/span><\/a><\/p>")[1].split("Previous Chapter<\/a>")[0];
text=text.substring(0,text.lastIndexOf("<p><a href"));
Onlu
  • 51
  • 4

1 Answers1

0

Do not use RegEx. Read why: https://blog.codinghorror.com/parsing-html-the-cthulhu-way/.

Extract text between two fixed <span style="float: right">Next Chapter</span></a></p>, then cut finalizing <p><a href="**changes**">Previous Chapter</a> <a href="**changes**"> off.

Organis
  • 7,243
  • 2
  • 12
  • 14