0

Hi I would like to automate a process to download a file from a site, which has a form login When I use a browser, I can see a cookie in the Request Http Header. This seems to be required in order to be authorised successfully. Otherwise I end up with 401 error. Even if I send the request twice, it doesn't work as the first response doesn't contain the required cookie. Any suggestion if it is feasible to obtain the cookie from a Request Http Header using python.

Url to login: https://services.geoplace.co.uk/login

Url to download required file: https://services.geoplace.co.uk/api/downloadMatrix/getFile?fileName=30001_81s3.zip&fileType=LEVEL_3&fileVersion=May-2020&sfAccountId=xxx

import mechanize
import cookielib
from bs4 import BeautifulSoup as bs
import html2text
import html5lib#
import sys

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Chrome')]

# The site we will navigate into, handling it's session
br.open('https://login.geoplace.co.uk/login')

# View available forms
for f in br.forms():
    print "Formm " + str(f)

# Select the second (index one) form (the first form is a search query box)
br.select_form(nr=0)

# User credentials
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'

# Login
response = br.submit()

br.open('https://services.geoplace.co.uk')
request = br.request
print request.header_items()

# if successful we have some cookies now
cookies = br._ua_handlers['_cookies'].cookiejar
# convert cookies into a dict usable by requests
cookie_dict = {}
for c in cookies:
    cookie_dict[c.name] = c.value
print cookie_dict


br.open('https://services.geoplace.co.uk/api/downloadMatrix/getFile? 
fileName=30001_81s3.zip&fileType=LEVEL_3&fileVersion=May- 
2020&sfAccountId=xxx')
  • It would be better if you provide your current code or info like what http client you are using. I suggest you check out python 'requests' library. – mursalin Jun 25 '20 at 11:19
  • https://stackoverflow.com/questions/7164679/how-to-send-cookies-in-a-post-request-with-the-python-requests-library this might be helpful. – mursalin Jun 25 '20 at 11:20
  • I've added the code I am using. Once logged on I was expecting to get all the cookie needed for the next request. But it doesn't look so. – user3050151 Jun 25 '20 at 13:01
  • did you checked 'cj' if cookies were set or if its empty? – mursalin Jun 25 '20 at 13:29
  • Yes, actually the cookies are set (see code at the bottom). However they don't seem correct. This is why the last line is failing with 401 error. Should I expect the cookie to be similar to a browser ? – user3050151 Jun 25 '20 at 14:25

1 Answers1

0

The api's you mentioned above supports OAuth2 ( Client , Password ) grant types. If you ask help from GeoPlace ( through this email support@geoplace.co.uk ) - we will work on your request to create client credentials and you should be able to access it ( We have other entities consuming our services this way )

Once you get the credentials here are the steps

  1. curl 'https://login.geoplace.co.uk/oauth/token' -H "Authorization: Basic ZZZZZZZZZZZZZZZZZZZ" -d username='xxxxxxx' -d password='yyyyyyy' -d grant_type=password

    ( This will return your token info )

  2. Using the token above perform curl -H "Authorization: Bearer 66666-yyy-Ysdf-bb-xxxxx" -o 'FILE_NAME_TO_SAVE_IN_LOCAL.zip' 'https://services.geoplace.co.uk/api/downloadMatrix/getFile?fileName=30001_81s3.zip&fileType=LEVEL_3&fileVersion=May-2020&sfAccountId=xxx'