Scraper in Python gives "Access Denied"

Question

I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:
https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585

By now I use this code :

import bs4
import requests

def extract_source(url):
    source=requests.get(url).text
    return source

def extract_data(source):
    soup=bs4.BeautifulSoup(source)
    names=soup.findAll('title')
    for i in names:
        print i

extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))

But when I execute this code, it gives me an error:

<titlee> Access Denied</titlee>

What can I do to solve this?

Probably you should set the User-Agent. I'm voting to move this over to stackoverflow where you can get more help. — Rápli András, Feb 01 '17 at 14:24

score 18 · Answer 1 · answered Feb 01 '17 at 14:52

As was mentioned in comments, you need to specify allowable user-agent and pass it as headers:

def extract_source(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
    source=requests.get(url, headers=headers).text
    return source

score 5 · Answer 2 · answered Feb 01 '17 at 15:27

Try this:

import bs4
import requests

def extract_source(url):
     agent = {"User-Agent":"Mozilla/5.0"}
     source=requests.get(url, headers=agent).text
     return source

def extract_data(source):
     soup=bs4.BeautifulSoup(source, 'lxml')
     names=soup.findAll('title')
     for i in names:
     print i

extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))

I added 'lxml' to potentially avoid parse error.

score 1 · Answer 3 · edited Nov 22 '22 at 17:49

1

When using

def extract_source(url):
    headers = {"User-Agent":"Mozilla/5.0"}
    source=requests.get(url, headers=headers).text
    return source

output is:

<title>Saree Retailers in Panipat - Best Deals online - Justdial</title>

Add User-Agent to your request, some sites do not respond to the request without User-Agent.

edited Nov 22 '22 at 17:49

Davide Fiocco

5,350
5
35
72

answered Feb 01 '17 at 14:53

宏杰李

11,820
2
28
35

Scraper in Python gives "Access Denied"

3 Answers3

Linked

Related