0

Usually I can pick up all the href but my scirpt doesn't scrape anything and I cannot figure it why ?

Here's my script :

import warnings
warnings.filterwarnings("ignore")

import re
import json
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url).content, "html.parser")



#get the links

all_title = soup.find_all('a', class_ = 'prod-item__container')

data_titles = []
for title in all_title:
    try:
        product_link = title['href']
        data_titles.append(product_link)
    except:
        pass

print(data_titles)

data = pd.DataFrame({
    'links' : data_titles
    })

data.to_csv("testlink.csv", sep=';', index=False)

Here's the html :

html

It seems that soup.find_all('a', class_ = 'prod-item__container') shoudl work but it doesn't.

Any ideas why ?

FalconBob
  • 51
  • 6

2 Answers2

0

Use some headers in your request to get the content - Some sites provide different responses based on user-agent to avoid scraping or crawling - read more:

headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

Example

headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

#get the links

all_title = soup.find_all('a', class_ = 'prod-item__container')

data_titles = []
for title in all_title:
    try:
        product_link = title['href']
        data_titles.append(product_link)
    except:
        pass

print(data_titles)
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
0

To get the data we need to pass user-agent details for this website. Use below code.

url = "https://www.frayssinet-joaillier.fr/fr/marques/longines" header = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36',

} soup = BeautifulSoup(requests.get(url, headers =header).content, "html.parser")

Manoj biroj
  • 288
  • 1
  • 6