0

I am trying to scrape from the <span class= ''> using python. The code looks like this on the pages I am scraping:


    <li class="item">
        <span class="name">Sara</span>
        <span class="value">selling potato in town</span>
    </li>
 <li class="item">
   <span class="name">Grouping</span>
    <span class="value">clothes</span>
   </li>

  <li class="item">
    <span class="name">Phone</span>
      <span class="value">
       04142018071 09128983727
      </span>
 </li>

What I need to get are "Sara" and "selling potato in town" and "Phone" and " 04142018071 09128983727 " Can you help me?

I try the following code:

for  stng1 in soup.find_all('li', class_='item'):
     for stng in stng1.find_all('span'):
         #print (stng)
         if stng.has_attr("class"):
             if stng['class'] == 'name':
                 print (stng.string)

  • it is your code efforts that we need to see - what methods are you using to scrape this data? What language? What problems have you encountered? – Professor Abronsius Dec 06 '19 at 07:40

2 Answers2

1
from bs4 import BeautifulSoup
  html_doc = """
  <li class="item">
    <span class="name">Sara</span>
    <span class="value">selling potato in town</span>
  </li>`
  """


soup = BeautifulSoup(html_doc, 'html.parser')

Content = soup.find("li",{"class":"item"})


name=(Content.find("span",{"class":"name"}).get_text())

value=(Content.find("span",{"class":"value"}).get_text())

print(name)
print(value)
Khadga shrestha
  • 1,120
  • 6
  • 11
0

try this

from simplified_scrapy.simplified_doc import SimplifiedDoc 
doc = SimplifiedDoc(html)
lst = doc.getElements(tag='li',value='item')
for i in lst:
  i = i.getChildren()
  for j in i:
    print ('%s=%s' % (j['class'],j.text))
dabingsou
  • 2,469
  • 1
  • 5
  • 8