0

I'm trying to work my way through some web scraping stuff with the requests library and I've stumbled across encrypted POST data. What I'd like to do is programmatically find a way to retrieve the key names of the POST variables so that I could send my own unfiltered values through. Is there a way to do this and could someone explain it to me or point me in the right direction?

import requests

s = requests.Session('http://thiswebsite.com/login', auth=('username','password'))

# Find some way to determine what k is for POST variables in k,v

params = {
    'INeedThis' : 'So I can pass this',
    'AndAlsoThis' : 'So I can pass this too',
    'AndSoOn' : 'etc'
}

x = post('http://thiswebsite.com/anotherpage', params)
print x.status_code
jchung05
  • 3
  • 2
  • Not sure what you're asking. You're the one passing the "POST variables" in. – Daniel Roseman Dec 08 '16 at 11:23
  • 1
    What do `x.text()` and `x.json()` return? – ettanany Dec 08 '16 at 11:23
  • Are you asking to get all post variable..? – Mr world wide Dec 08 '16 at 11:23
  • open url in Chrome/Firefox and use `DevTools` (built-in in browser) to see what data is send by browser. Then you have to use the same in your code. – furas Dec 08 '16 at 11:28
  • if you what to find programmatically then you have to `get()` page with `form` and use HTML parser (ie. `lxml`, `BeautifulSoup`) to find all tags inside `
    ` tag (like browser do).
    – furas Dec 08 '16 at 11:30
  • @furas I think you have the answer I'm looking for. I'm already using bs4 for something else; I guess I can just make a regex object to search for what I need. Thank you! – jchung05 Dec 08 '16 at 11:41
  • If you use `bs4` so why do you want ot use regex. `bs4` can be more usefull - you have to find all `` and ` – furas Dec 08 '16 at 11:45
  • You're right and I'm new to this. I'll attempt a solid bs4 only approach for my project. – jchung05 Dec 08 '16 at 11:54

1 Answers1

0

If you need names of data sends by POST then you have to first GET this page and find all <input>, <button>, <text> in <form>

For example - login page on facebook.com

import requests

from bs4 import BeautifulSoup

r = requests.get('http://www.facebook.com/')

soup = BeautifulSoup(r.content, 'lxml')

for x in soup.find_all('form'):
    for i in x.find_all(['input', 'button', 'text']):
        print(i.name, i.get('name'))
    print('---')

result:

input lsd
input email
input pass
input None
input persistent
input default_persistent
input timezone
input lgndim
input lgnrnd
input lgnjs
input ab_test_data
input locale
input next
---
input lsd
input firstname
input lastname
input reg_email__
input reg_email_confirmation__
input reg_passwd__
input sex
input sex
button websubmit
input referrer
input asked_to_login
input terms
input ab_test_data
input contactpoint_label
input locale
input reg_instance
input captcha_persist_data
input captcha_session
input extra_challenge_params
input recaptcha_type
input captcha_response
button None
---

As you see there are many inputs so sometimes it is better to use browser manually and see what "names" are send from browser to server :)

furas
  • 134,197
  • 12
  • 106
  • 148