I have an html page. I want to extrac the text within only those tags that have a question mark at the end of their sentence/text. I using:
<.+?>(.+?)<.+?>
To get the text inside tags. but there are two problems with this: 1- All the nested tags are also extracted which I don't want.(I just want plain text) 2-I only want to get those text within tags that have a question mark at the end.
I don't know how to do this. Can someone help me please(in Java). PS: the html pages that I have are malformed, therefore, using tools such as JSoup is not a choice. That's why I am using regex only.