In a unique case of html found on web there is a html document which has multiple html tags within the parent HTML tag. I want to parse the contents of the html tag. Can anyone point me in the direction to do so ?
Thanks in advance.
Edit 1: Using BeautifulSoup
soup = BeautifulSoup(html, "lxml")
gives only the parent html and the tags present within it.
However I am assuming if the browser is able to render the html BS should be able to parse it. is that assumption correct?
Edit 2: Actually the html is a malformed html ( i am assuming here), this is the html I am parsing with beautifulsoup somehow I am only getting the tables and and of 1st (outermost) html. If I manually remove the multiple HTML tags and only keep 1 html tag I am able to parse the table in BS. So the question is "Is there any way to parse the below html and get the data from the innermost or all tables in the file?
<!DOCTYPE html>
<html>
<head>
<title>Some Title</title>
</head>
<body>
some html to display the tables.
<html>
<head></head>
<title>Some other title</title>
<body>
some html to display even more tables.
</body>
</html>
</body>
</html>