2

Trying to filter the products name list using the header tags, but it always returns none.

source : https://www.tendercuts.in/chicken

code :

import requests
from bs4 import BeautifulSoup
def ExtractData(url):
response = requests.get(url=url).content
soup = BeautifulSoup(response, 'lxml')
header = soup.find("mat-card-header", {"class": "mat-card-header ng-tns- c9-188"})
print(header)
ExtractData(url="https://www.tendercuts.in/chicken")
HedgeHog
  • 22,146
  • 4
  • 14
  • 36

3 Answers3

2

Here's code to iterate all the <mat-card-header> items showing the class id and the associated text of the card-title. You can further filter on the child elements in each of header items to find particular products.

soup = BeautifulSoup(response, 'lxml')
headers = soup.find_all("mat-card-header")
for header in headers:
   print(header.get('class'), header.find('mat-card-title').text)

Output:

['mat-card-header', 'ng-tns-c9-3'] Chicken Curry Cut (Skin Off)
['mat-card-header', 'ng-tns-c9-3'] Chicken Curry Cut (Skin Off)
...
['mat-card-header', 'ng-tns-c9-19'] Chicken Wings
CodeMonkey
  • 22,825
  • 4
  • 35
  • 75
2

What happens?

You try to find your tags by class that do not exist in your soup, cause it is generated dynamically and/or is caused by typo.

How to fix?

Select your elements more specific by tag or id and avoid classes cause these are more often created dynamically:

[t.text for t in soup.find_all('mat-card-title')]

To avoid the duplicates just use set() on result:

set([t.text for t in soup.find_all('mat-card-title')])

Example

import requests
from bs4 import BeautifulSoup

URL = 'https://www.tendercuts.in/chicken'
r = requests.get(URL)
soup = BeautifulSoup(r.text)

print(set([t.text for t in soup.find_all('mat-card-title')]))

Output

{'Chicken Biryani Cut - Skin On','Chicken Biryani Cut - Skinless','Chicken Boneless (Cubes)','Chicken Breast Boneless','Chicken Curry Cut (Skin Off)','Chicken Curry Cut (Skin On)','Chicken Drumsticks',     'Chicken Liver','Chicken Lollipop','Chicken Thigh & Leg (Boneless)','Chicken Whole Leg','Chicken Wings','Country Chicken','Minced Chicken','Premium Chicken-Strips (Boneless)','Premium Chicken-Supreme (Boneless)','Smoky Country Chicken (Turmeric)'}

EDIT

To get title, prices, ... I would recommend to iterate the mat-cards in following way.

import requests,re
from bs4 import BeautifulSoup

URL = 'https://www.tendercuts.in/chicken'
r = requests.get(URL)
soup = BeautifulSoup(r.text)

data = []
for item in soup.select('mat-card:has(mat-card-title)')[::2]:
    data.append({
        'title':item.find('mat-card-title').text,
        'price':re.search(r'₹\d*',soup.find('p', class_='current-price').text).group(),
        'weight':w if (w:=item.select_one('.weight span span:last-of-type').next_sibling) else None
    })

print(data)

Output

[{'title': 'Chicken Curry Cut (Skin Off)', 'price': '₹99', 'weight': 'Customizable'}, {'title': 'Chicken Curry Cut (Skin On)', 'price': '₹99', 'weight': 'Customizable'}, {'title': 'Country Chicken', 'price': '₹99', 'weight': 'Customizable'}, {'title': 'Premium Chicken-Supreme (Boneless)', 'price': '₹99', 'weight': ' 330 - 350 Gms'}, {'title': 'Chicken Boneless (Cubes)', 'price': '₹99', 'weight': ' 480 - 500 Gms'}, {'title': 'Chicken Drumsticks', 'price': '₹99', 'weight': ' 280 - 360 Gms'}, {'title': 'Chicken Biryani Cut - Skin On', 'price': '₹99', 'weight': ' 480 - 500 Gms'}, {'title': 'Chicken Thigh & Leg (Boneless)', 'price': '₹99', 'weight': ' 480 - 500 Gms'}, {'title': 'Chicken Biryani Cut - Skinless', 'price': '₹99', 'weight': ' 480 - 500 Gms'}, {'title': 'Minced Chicken', 'price': '₹99', 'weight': ' 480 - 500 Gms'}, {'title': 'Smoky Country Chicken (Turmeric)', 'price': '₹99', 'weight': ' 650 - 800 Gms'}, {'title': 'Chicken Lollipop', 'price': '₹99', 'weight': ' 280 - 300 Gms'}, {'title': 'Chicken Whole Leg', 'price': '₹99', 'weight': ' 370 - 390 Gms'}, {'title': 'Chicken Breast Boneless', 'price': '₹99', 'weight': ' 240 - 280 Gms'}, {'title': 'Premium Chicken-Strips (Boneless)', 'price': '₹99', 'weight': ' 330 - 350 Gms'}, {'title': 'Chicken Liver', 'price': '₹99', 'weight': ' 190 - 210 Gms'}, {'title': 'Chicken Wings', 'price': '₹99', 'weight': ' 480 - 500 Gms'}]
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
-1

This is the most common problem with web scraping: most websites use JavaScript to change or add to the content on the page after loading the initial page. Whatever the JavaScript is supposed to change or load isn't on the page after the initial request.

The same is true for your code. If you look at the actual HTML (not in a browser, in your code), you'll find that it has many fields that angular.js code will be filling in later.

You'll need to load your page using a package like selenium, which uses a browser driver to load the page, execute the JavaScript and make the result available to you. (it does a lot more, like allowing you to navigate the site by clicking it, filling out fields, etc.)

selenium is a complex library with many options, but you can get started with:

pip install selenium

And by downloading a browser driver like Gecko Driver or ChromeDriver

And then something like this will work:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service  # for Chrome
from selenium.webdriver.firefox.service import Service  # for Firefox

service = Service('/path/to/driver')

service.start()

driver = webdriver.Remote(service.service_url)

driver.get('https://www.tendercuts.in/chicken');

# do something with what driver loaded here

driver.quit()

You could just bs4 your way through driver.page_source, but since you now have selenium anyway, you could also look into the ways selenium allows you to find and select elements, like using the built-in XPath functions.

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • Can whoever voted down the answer please provide a comment on what is wrong with it? – Grismar Feb 11 '22 at 20:58
  • Not the voter, but this answer seems like a rehash of a [canonical thread](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python/) that it'd be best just to link to. – ggorlen Feb 11 '22 at 20:59
  • The same information is certainly in there, thanks - will vote to close, because that has all bases covered. – Grismar Feb 11 '22 at 21:03
  • Thanks, although it might not apply to OP's case here. It seems their data is in the static markup and it's just a typo on the class. – ggorlen Feb 11 '22 at 21:20