how to scrape nested tag elements with python

Question

hi i would like to get some data which is on below < del> or < ins> tags but i could not find any solution for it can anyone has idea about this scraping and is there any short way for getting those informations

this is my python code

  import requests
  import json
  from bs4 import BeautifulSoup
  
  header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}
  
  base_url = "https://www.n11.com/super-firsatlar"
  
  r = requests.get(base_url,headers=header)
  
  if r.status_code == 200:
    soup = BeautifulSoup(r.text, 'html.parser')
    books = soup.find_all('li',attrs={"class":"column"})
  
    result=[]
    for book in books:
      title=book.find('h3').text
      link=base_url +book.find('a')['href']
      picture = base_url + book.find('img')['src']
  
      price = soup.find('a',attrs={"class":"ins"})
  
  
  
      single ={'title':title,'link':link,'picture':picture,'price':price}
      result.append(single)
      with open('book.json','w', encoding='utf-8') as f:
        json.dump(result ,f,indent=4,ensure_ascii=False)
  else:
    print(r.status_code)

<div class="proDetail">
  <a href="https://test.com"class="oldPrice" title="Premium">  
      
      <del>69,00 TL</del></a>
      
  <a href="https://test.com"class="newPrice" title="Premium">
     
     <ins>14,90</ins>
       
         </a>
</div>

and this is my output

{
    "title": "Premium",
    "link": "https://test.com",
    "picture": "https://pic.gif",
    "price": null
},

K3it0 · Accepted Answer · 2020-10-31T11:30:43.220

0

You are searching for the wrong class. First search for the class 'newPrice' to get the a-block with:

price = book.find('a', attrs={'class': 'newPrice'})

Then you can search inside this a-block for the ins-block like:

price = book.find('a', attrs={'class': 'newPrice'}).find('ins')

Then your result would look like:

<ins>14,90</ins>

For final result strip the html tags:

price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text.strip()

edited Oct 31 '20 at 11:30

answered Oct 31 '20 at 10:57

K3it0

16
4

i used book instead of soup then it works thanks so much also i have one issue to i got price side as like this "price": "189,90\n TL" how can i delete "/n"and "spaces" – Mehmet Reşat Demir Oct 31 '20 at 11:04
@MehmetReşatDemir Did you get past the Not JSON serializable error? – Abhishek Rai Oct 31 '20 at 11:13
@AbrarAhmed it was no work correctly and this guy solved my error this method is more easy – Mehmet Reşat Demir Oct 31 '20 at 11:15
@MehmetReşatDemir I don't mean find the elements. That you can find with simply with `find('ins')` if the `tags` are unique. I meant are you able to write it to JSON? I – Abhishek Rai Oct 31 '20 at 11:16
@MehmetReşatDemir please look into this post for deleting spaces etc. https://stackoverflow.com/questions/10711116/strip-spaces-tabs-newlines-python – K3it0 Oct 31 '20 at 11:47

how to scrape nested tag elements with python

1 Answers1