0

I've looked around and tried a bunch of different things, but cant seem to find any info on this topic.

I'm trying to scrape info from my bank (Discover) and wrote a script to do so. It returns everything fine, but is returning a "logged out" page instead of the desired homepage with my balance.

My messy code is as follows:

import requests
from bs4 import BeautifulSoup as bs
def scrapeDiscover():
    URL = 'https://portal.discover.com/customersvcs/universalLogin/signin'
    request_URL = 'https://portal.discover.com/customersvcs/universalLogin/signin'

    HEADERS = {'User-Agent':'User-Agent: Mozilla/5.0 (Windows NT; Windows NT 6.2; en-US) WindowsPowerShell/4.0', 'Origin':'https://portal.discover.com', 'Referer':'https://portal.discover.com/customersvcs/universalLogin/ac_main'}
    s = requests.session()
    PAYLOAD = {
        'userID' : 'username',
        'password' : 'password',
        'choose-card' : 'Credit Card',
        'pm_fp' : 'version=-1&pm_fpua=mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/95.0.4638.69 safari/537.36|5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36|Linux x86_64&pm_fpsc=24|1920|1080|1053&pm_fpsw=&pm_fptz=-6&pm_fpln=lang=en-US|syslang=|userlang=&pm_fpjv=0&pm_fpco=1',
        'currentFormId' : 'login',
        'userTypeCode' : 'C',
        'rememberOption' : 'on',
    }

    login_req = s.post(URL, headers=HEADERS, data=PAYLOAD)
    cookies = login_req.cookies
    soup = bs(s.get('https://card.discover.com/cardmembersvcs/achome/homepage').text, 'html.parser')

    balance = soup.text
    print(balance)

scrapeDiscover()

I also looked at the post request info needed, and have it here: enter image description here

Any help or suggestions would be super appriciated! Even just a suggestion would help a ton. Thanks so much all! Let me know if more information is needed.

EDIT: Added information I imagine theres probably some missing cookie or token in the post request, but I've poured over the code many times and find anything that works when implimented, or, even if I'm implementing it correctly.

A couple things that stand out to me:

SSID: In the 'Form Data' of the post request that works, theres an 'ssid' form with a long string. However, this changes every time and I imagined that it stood for 'session ID' and that I didnt need it since my code was creating a new session.

ssid: 0433c923-6f48-4832-8d6d-b26c5b0e6d4-1637097180562

STRONGAUTHSVS: Another thing I found that stood out was this "STRONGAUTHSVS" variable (nested within the long string of cookies, both in the request and recieved headers)

STRONGAUTHSVCS=SASID=null&SATID=b081-

sectoken: Lastly, I saw the work token and I thought this could be it. A variable in the cookies with 'sectoken' as the variable name. No idea what it is though, or how I would impliment it.

sectoken=hJNQgh7EOnH1xx1skqQqftbV/kE=

With all these, I've tried my best at implimenting them into the headers in my code, but it seemed to have no effect on the output. I've attached a pastebin of the site cookies and form data captured (minus any sensitive data). If anyone has any ideas, I'd be super thankful! https://pastebin.com/PNnV6Mpw

1 Answers1

1

read this. I think, you probably need a token for your POST request, for the security reasons. If just the scraping is important, try to use selenium.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("https://portal.discover.com/customersvcs/universalLogin/signin")
driver.maximize_window()

log_in = (
    WebDriverWait(driver, 5)
    .until(
        EC.element_to_be_clickable(
            (By.XPATH, "/html/body/div[1]/header/div[1]/div[2]/div[2]/ul/li[3]/a")
        )
    )
    .click()
)
driver.find_element_by_xpath("//*[@id='userid']").send_keys("your_user_id")
driver.find_element_by_xpath("//*[@id='password']").send_keys("your_password")
driver.find_element_by_xpath("//*[@id='log-in-button']").click()

I got an error when I use left panel for log in.

A D
  • 585
  • 1
  • 7
  • 16
  • Yeah, thats what I was thinking too. I poured over all the data over and over again, and added some info up above about some of my working theories. Still no solution as of yet, so any more ideas you have would be greatly appriciated! – Ultracrepidarian Nov 16 '21 at 21:24
  • 1
    Maybe this [answer](https://stackoverflow.com/a/45868297) can help you. My answer is also edited. I am using `selenium==3.141.0` and `geckodriver 0.24.0`. [screenshot](https://ibb.co/xg4XxRQ). After log in you can start scraping. – A D Nov 17 '21 at 00:30
  • thanks so much! I actually discovered selenium right after I edited my post and was trying it out before I left for work, but didn't have the chance to get anywhere. I'll try this when I get back! – Ultracrepidarian Nov 17 '21 at 03:41
  • 1
    Success!! I used your code snipped and tried to use chrome, but got an outdated browser message. I switched to firefox, (I really have no preference. It was just the scraping that mattered to me) and it worked fine!! Now just to work on getting the info itsself. Out of curiousity, what exactly does the log_in variable do? I'm new to python and just trying to learn. Thanks so much again! – Ultracrepidarian Nov 17 '21 at 06:00
  • 1
    I'm glad I could help you. Thanks for asking! The page had two options for logging in, one at the left and one at the top right. As I was trying to click the left log in button, I received an error that I could not solve quickly. So, I tried to log in using the other button at the top right, and fortunately it worked. The `log_in` element corresponds to the top right log in button. In this case, I have used a conditional wait with a value of 5 seconds, the good thing is that if the element is visible, we don't have to wait until the end (5 seconds). Selenium can find elements that are visible. – A D Nov 17 '21 at 13:36