1

Context:

I am trying to code my own money aggregator because most of available tools on the market does not cover all financial websites yet. I am using python 2.7.9 on a raspberrypi.

I managed to connect to 2 of my accounts so far (one crow-lending website and one for my pension) thanks to requests library. The third website I am trying to aggregate is giving me hard time since 2 weeks now and its name is https://www.amundi-ee.com.

I figured out that the website is actually using JavaScript and after many research I ended up using dryscrape (I cannot use selenium cause Arm is not supported anymore).

Issue:

When running this code:

import dryscrape

url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult

s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)

There is an issue when code visits urlConnect line 21, the body printing line 22 returns the below:

{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}

Question

Why do have I have such error message and how can I login to the website properly to retrieve the data I am looking for?

PS: My code inspiration comes from this issue Python dryscrape scrape page with cookies

Community
  • 1
  • 1
JB Rolland
  • 87
  • 10
  • Use time.sleep(5) after login. Then try again and tell if error occurs – Exprator May 07 '17 at 15:11
  • Sorry I could not test it earlier, unfortunately after putting the sleep right after login (login.form().submit()) issue still occurs (I also try to double the time) – JB Rolland May 12 '17 at 14:35
  • do one thing after that login form submit, print the current url like this, s.url() and check if the url is the one you want to scrape. then store it in a variable and then s.visit(url), because if you try to access a page which is login protected it will give error – Exprator May 12 '17 at 15:35
  • ok so I tried s.url() and the print result is: https://www.amundi-ee.com/psf/?mail=XXXXXXXX&password=#login (with XXXXXXXX being my login previously mentioned in my code). I was not really trying to scrap this url before but if I am copy pasting it in a web browser (Chrome) and I am already connected then I am getting redirected on https://www.amundi-ee.com/psf/?mail=XXXXXXXX&password=# and **it got the data I want to scrap!** if I am now visiting this url via s.visit(url) I don't get the data I want but a message in french saying "Your web browser version is not compatible blabla" – JB Rolland May 13 '17 at 02:41

1 Answers1

0

ok so after more than one month of trying to tackle this down, I am very delighted to say that I finally managed to get what I want

What was the issue?

Basically 2 major things (maybe more but I might have forgotten in between):

  1. the password has to be pushed via button and those are randomly generated so every time you access you need to do a new mapping
  2. login.form().submit() was messing around the access to the page of needed data, by clicking the validate button was good enough

Here is the final code, do not hesitate to comment if you find a bad usage as I am a python novice and a sporadic coder.

import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys 


def getAmundi(seconds=0):

    url = 'https://www.amundi-ee.com/psf'
    extensionInit='/#login'
    urlInit = url + extensionInit
    urlResult = url + '/#'
    timeoutRetry=1

    if 'linux' in sys.platform:
        # start xvfb in case no X is running. Make sure xvfb 
        # is installed, otherwise this won't work!
        dryscrape.start_xvfb()

    print "connecting to " + url + " with " + str(seconds) + "s of loading wait..." 
    s = dryscrape.Session()
    s.visit(urlInit)
    sleep(seconds)
    s.set_attribute('auto_load_images', False)
    s.set_header('User-agent', 'Google Chrome')
    while True:
        try:
            q = s.at_xpath('//*[@id="identifiant"]')
            q.set("XXXXXXXX")
        except Exception as ex:
            seconds+=timeoutRetry
            print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    #get password button mapping
    print "loging in ..."
    soup = BeautifulSoup(s.body())
    button_number = range(10)
    for x in range(0, 10):
     button_number[int(soup.findAll('button')[x].text.strip())] = x

    #needed button
    button_1 = button_number[1] + 1
    button_2 = button_number[2] + 1
    button_3 = button_number[3] + 1
    button_5 = button_number[5] + 1

    #push buttons for password
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
    button.click()
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
    button.click()
    ..............

    # Push the validate button
    button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
    button.click()
    print "accessing ..."
    sleep(seconds)

    while True:
        try:
            soup = BeautifulSoup(s.body())
            total_lended = soup.findAll('span')[8].text.strip()
            total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
            print total_lended

        except Exception as ex:
            seconds+=1
            print "Failed, retrying to get the data in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    s.reset()
JB Rolland
  • 87
  • 10