python3 to extract a html part from html with xpath

Question

I want to extract a part of html from the following html with python xpath. my question just want to extract the html part include tag and text, and this Get all text inside a tag in lxml question is to extract text part of html, so these two questions is different.

 <html>
 <body> 
 <div class ＝"item">
  <ul>
     <li class="item-0"><a href="link1.html">first item</a></li>
     <li class="item-1"><a href="link2.html">second item</a></li>
     <li class="item-inactive"><a href="link3.html">third item</a> </li>
     <li class="item-1"><a href="link4.html">fourth item</a></li>
     <li class="item-0"><a href="link5.html">fifth item</a></li>
  </ul>
  </div>
  <div  class = "movie">
  <div  title = "name">
  <ul>[url=http://]
     <li class="item-0"><a href="link1.html">movie a</a></li>
     <li class="item-1"><a href="link2.html">movie b</a></li>
     <li class="item-inactive"><a href="link3.html">movie c</a></li>
     <li class="item-1"><a href="link4.html">movie d</a></li>
  </ul>
  </div>
  </div>
  </body>
  </html>

Actually, I just want to extract the following html from the above html.

   <div title = "name">   
   <ul>
     <li class="item-0"><a href="link1.html">movie a</a></li>
     <li class="item-1"><a href="link2.html">movie b</a></li>
     <li class="item-inactive"><a href="link3.html">movie c</a></li>
     <li class="item-1"><a href="link4.html">movie d</a></li>
    </ul>
   </div>

My code imports requests

 page = requests.get('........html')
 tree = html.fromstring(page.content)
 body = tree.xpath('//div[contains(@title, "name")]')
 print('body:', body)

but the result is

   <Element div at 0x103620e58>

I want to get all the elements in this part html, for example

   <ul> <li> .

please use the xpath method not other method.

Possible duplicate of [Get all text inside a tag in lxml](http://stackoverflow.com/questions/4624062/get-all-text-inside-a-tag-in-lxml) — Rafael Almeida, Jun 07 '16 at 10:19

hr_117 · Accepted Answer · 2016-06-07T10:41:16.510

2

I want to get all the elements in this part html, for example <ul> <li>

Try to use:

  body = tree.xpath('//div[contains(@title, "name")]/ul')

or:

Update:(Thanks to @RafaelAlmeida) for all elements blow the div

  body = tree.xpath('//div[contains(@title, "name")]//*')

edited Jun 07 '16 at 10:41

answered Jun 07 '16 at 10:14

hr_117

9,589
1
18
23

That's not what OP asked for! – Rafael Almeida Jun 07 '16 at 10:21
@RafaelAlmeida: Hm may be you a right because of "get **all** the elements" – hr_117 Jun 07 '16 at 10:42
Thanks@hr_117 and @Rafael Almeida， This code that I want:body = tree.xpath('//div[contains(@title, "name")]//*') you are a good guy and be serious to you answer. Best wishes! Further, if your have some materials link or tutorials about xpath, please tell me! Anyway, thanks a lot! – tktktk0711 Jun 08 '16 at 01:11

python3 to extract a html part from html with xpath

1 Answers1