Web scraping from a Json page but HTTPError: HTTP Error 400: Bad Request

Question

I am trying to get the data from this website.
https://api.etherscan.io/api?module=account&action=tokentx&contractaddress=0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2&page=1&offset=100&sort=asc&apikey=YourApiKeyToken
However, I keep getting errors when I execute the following code

import pandas as pd
import json
import urllib.request
from urllib.request import FancyURLopener

url = 'https://api.etherscan.io/api?module=account&action=tokentx&contractaddress=0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2&page='
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)     Chrome/37.0.2049.0 Safari/537.36'}
request_interval = 2  # interval

urls = []
df = []
if __name__ == '__main__':
    for i in range(1, 2):
        url = urllib.parse.urljoin(url, '&page='+str(i)+'&offset=10000&sort=asc&apikey=YourApiKeyToken')
        urls.append(str(url))

    for url in urls:
        headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
        request = urllib.request.Request(url=url, headers=headers)
        html = urllib.request.urlopen(request).read()
        result = json.loads(html.decode('utf-8'))['blockNumber']
        df.extend(json.loads(html.decode('utf-8'))['blockNumber'])
        print('Completed URL : ', url)

pdf = pd.DataFrame(df)

pdf.to_csv("output.csv")

I have tried several solutions I found here Stackoverflow.
urllib2.HTTPError: HTTP Error 400: Bad Request - Python
urllib2 HTTP Error 400: Bad Request

I also have changed

headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}

and

{'Authorization': auth,
             'Content-Type':'application/json',
             'Accept':'application/json'}

but still getting the same error.

Thank you

Anbarasan · Accepted Answer · 2019-04-17T05:52:59.737

urljoin is for different purpose than what you are intending to use.

from the docs

Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL. For example:
>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'

I am not sure if you can use it for combining query params of an url

with this the url you would get after the urljoin would be something like

https://api.etherscan.io/&page=1&offset=10000&sort=asc&apikey=YourApiKeyToken

which is wrong.

use string concatenation. in the first for loop, change from

url = urllib.parse.urljoin(url, '&page='+str(i)+'&offset=10000&sort=asc&apikey=YourApiKeyToken')

to

url = url + str(i) + '&offset=10000&sort=asc&apikey=YourApiKeyToken'

you are reassigning your value to the main url variable, inside the for loop. so in the next iteration you will be adding the offset part on to the first iterations url.

adding on with the above change, instead of

for i in range(1, 2):
    url = urllib.parse.urljoin(url, '&page='+str(i)+'&offset=10000&sort=asc&apikey=YourApiKeyToken')
    urls.append(str(url))

you could do

for i in range(1, 2):
        urls.append(url + str(i) + '&offset=10000&sort=asc&apikey=YourApiKeyToken')

hope you are aware that the first loop will run only once. range(1,2) will return [1] and not [1, 2]

Web scraping from a Json page but HTTPError: HTTP Error 400: Bad Request

1 Answers1