3

I use twill to navigate on a website protected by a login form.

from twill.commands import *

go('http://www.example.com/login/index.php') 
fv("login_form", "identifiant", "login")
fv("login_form", "password", "pass")
formaction("login_form", "http://www.example.com/login/control.php")
submit()
go('http://www.example.com/accueil/index.php')

On this last page I want to download an Excel file which is accessible through a div with the following attribute:

onclick="OpenWindowFull('../util/exports/control.php?action=export','export',200,100);"

With twill I am able to access the URL of the PHP script and show the content of the file.

go('http://www.example.com/util/exports/control.php?action=export')
show()

However a string is returned corresponding to the raw content: thus unusable. Is there a way to retrieve directly the Excel file in a way similar to urllib.urlretrieve()?

Antoine Gautier
  • 623
  • 8
  • 25
  • Looks like similar to http://stackoverflow.com/questions/16283799/how-to-read-a-csv-file-from-a-url-python – dmitryro Jun 19 '16 at 19:03
  • Not exactly: in this case the access to the website is protected by a password. I need to post a login form. Thus using `twill`. (I would prefer to use `requests` but there seems to be an intricate control of login headers and after many attempts I could only make it work with `twill`). – Antoine Gautier Jun 19 '16 at 19:14
  • EDIT: I edited my question: the file is in MS Excel format, not CSV, so binary data... – Antoine Gautier Jun 19 '16 at 19:33
  • If you can show or read the content it means you can store it on your end in whatever format you read it - you can use StringIO https://docs.python.org/2/library/stringio.html or similar as an intermediary storage for whatever you read and then convert it to csv . – dmitryro Jun 19 '16 at 19:44

2 Answers2

1

I managed to do it sending the cookie jar from twill to requests.

Nota: I could not use requests only due to an intricate control at login (was not able to figure out the correct headers or other options).

import requests
from twill.commands import *

# showing login form with twill
go('http://www.example.com/login/index.php') 
showforms()

# posting login form with twill
fv("login_form", "identifiant", "login")
fv("login_form", "password", "pass")
formaction("login_form", "http://www.example.com/login/control.php")
submit()

# getting binary content with requests using twill cookie jar
cookies = requests.utils.dict_from_cookiejar(get_browser()._session.cookies)
url = 'http://www.example.com/util/exports/control.php?action=export'

with open('out.xls', 'wb') as handle:
    response = requests.get(url, stream=True, cookies=cookies)

    if not response.ok:
        raise Exception('Could not get file from ' + url)

    for block in response.iter_content(1024):
        handle.write(block)
Antoine Gautier
  • 623
  • 8
  • 25
0

Another way using twill.commands.save_html modified to write as 'wb' instead of 'w': Python 2.7 using twill, saving downloaded file properly

Community
  • 1
  • 1
Antoine Gautier
  • 623
  • 8
  • 25