0

I have a large html code, I need certain tags inside of it.

for example, i want to print all tags start with

< dev ........
....
... 

until < /dev>

So the starting keyword will be " < dev " and the ending keyword will be < /dev>.

I want to do python 2.7 script to print the tags and in between code until the last word < /dev> and I am kind of stuck at that point.

Example:

<div title="buyer-name">test
       <span class="item-price">ff</span> </div>
martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

0

if the strings you are looking for ("< dev" and "< /dev>") are unique, you can use .find() like this:

my_html = "..."
token_1 = "<div"
token_2 = "</div>"
start = my_html.find(token_1)
end   = my_html.find(token_2) + len(token_2) # in order to reach the last char in this closing tag
CIsForCookies
  • 12,097
  • 11
  • 59
  • 124
  • yes it is uniqe, but i want the script to take the frist keyword
    – waterproof Mar 14 '17 at 14:47
  • @waterproof if so, my code will give you the positions of the start and end of the string you want to print, so just use after this: print(my_html[start]) and increment the start variable until you get the end location – CIsForCookies Mar 14 '17 at 14:50
0

You can use BeautifulSoup:

from bs4 import BeautifulSoup
html_code = "<html>...</html>"
soup = BeautifulSoup(h)
mydivs = soup.findAll('div')
for div in mydivs:
    print(str(div))
Yuval Pruss
  • 8,716
  • 15
  • 42
  • 67