1

Hello I am doing this problem using RE and the task is to extract the information of a make-up HTML. Title and content is what I need. This is what I came up with so far.

<body>([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*([^<]*)(?:<[^>]*+>)*<\/body>

I know its just repeating the same RE but I couldn't match it otherwise, so please help me there as well. Title being in the <title> </title> and content being in <body> </body>. But there is a problem. I need to ignore all the /n in the text and get only the text.

this is some sample text : <html>\n<head><title>Some title</title></head>\n<body>Here<p> is some </p>content <a href="www.somesite.com">\nclick</body>\n</html>

also I know that I should not parse HTML with RE from here RegEx match open tags except XHTML self-contained tags, but my task requires me to use RE.

0 Answers0