Hello I am doing this problem using RE and the task is to extract the information of a make-up HTML. Title and content is what I need. This is what I came up with so far.
<body>([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*<\/body>
I know its just repeating the same RE but I couldn't match it otherwise, so please help me there as well.
Title being in the <title> </title>
and content being in <body> </body>
. But there is a problem. I need to ignore all the /n
in the text and get only the text.
this is some sample text :
<html>\n<head><title>Some title</title></head>\n<body>Here<p> is some </p>content <a href="www.somesite.com">\nclick</body>\n</html>
also I know that I should not parse HTML with RE from here RegEx match open tags except XHTML self-contained tags, but my task requires me to use RE.