1

I have some HTML code

<li><h3>Number Theory - Even Factors</h3>
    <p lang="title">Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?</p>
    <ol class="xyz">
        <li>1183</li>
        <li>1200</li>
        <li>1050</li>
        <li>840</li>
    </ol>
    <ul class="exp">
        <li class="grey fleft">
            <span class="qlabs_tooltip_bottom qlabs_tooltip_style_33" style="cursor:pointer;">
            <span>
                <strong>Correct Answer</strong>
                    Choice (A).</br>1183
                </span> 
                Correct answer
            </span>
        </li>
        <li class="primary fleft">
            <a href="factors_6.shtml">Explanatory Answer</a>
        </li>
        <li class="grey1 fleft">Factors - Even numbers</li>
        <li class="orange flrt">Medium</li>
    </ul>       
</li>

In the HTML snippet above, I am trying to extract the <p lang="title"> Notice how it has <sup></sup> and <sub></sub> tags being used inside.

My Xpath expression .//p[@lang="title"]/text() does not retrieve the sub and sup contents. How do I get this output below

Desired Output

Number N = 2<sup>6</sup>*5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?
PirateApp
  • 5,433
  • 4
  • 57
  • 90

1 Answers1

1

XPath

You can simply get innerHTML with node() as below:

//p[@lang="title"]/node()

Note that it returns an array of nodes

Python

You can get required innerHTML with below Python code

from BeautifulSoup import BeautifulSoup

def innerHTML(element):
    "Function that receives element and returns its innerHTML"
    return element.decode_contents(formatter="html")

html = """<html>
               <head>...
               <body>...
               Your HTML source code
               ..."""

soup = BeautifulSoup(html)
paragraph = soup.find('p', { "lang" : "title" })

print(innerHTML(paragraph))

Output:

'Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?'
Andersson
  • 51,635
  • 17
  • 77
  • 129