0

I have a post request which I am trying to send using requests in python. But I get an invalid 403 error. The requests works fine through the browser.

POST /ajax-load-system HTTP/1.1
Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

What I am trying in python is:

import requests
import json

url = 'http://xyz.website.com/ajax-load-system'

payload = {
'Host': 'xyz.website.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-GB,en;q=0.5',
'Referer': 'http://xyz.website.com/help-me/ZYc5Yn',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Content-Length': '56',
'Cookie': 'csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1',
'Connection': 'close',
'csrf_test_name': 'a3f8adecbf11e29c006d9817be96e8d4',
'vID': '9999',
}    

headers = {}

r = requests.post(url, headers=headers, data=json.dumps(payload))
print(r.status_code)  

But this is printing a 403 error code. What am I doing wrong here?

I am expecting a return response as json:

{"status_message":"Thanks for help.","help_count":"141","status":true}

Gaurav Khe
  • 95
  • 1
  • 1
  • 7

1 Answers1

10

You are confusing headers and payload, an the payload is not JSON encoded.

These are all headers:

Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

Most of these are automated and don't need to be set manually. requests will set Host for you based on the URL, Accept is set to an acceptable default, Accept-Language is rarely needed in these situations, Referer, unless using HTTPS, is often not even set or filtered out for privacy reasons, so sites no longer rely on it being set, Content-Type must actually reflect the contents of your POST (and is not JSON!), so requests sets this for you depending on how you call it, Content-Length must reflect the actual content length, so is set by requests as it is in the best position to calculate this, and Connection should definitely be handled by the library as you don't want to prevent it from efficiently re-using connections if it can.

At best you could set X-Requested-With and User-Agent, but only if the server would not otherwise accept the request. The Cookies header reflect the values of cookies the browser holds. Your script can get their own set of cookies from the server by using a requests Session object to make an initial GET request to the url named in the Referer header (or other suitable URL on the same site), at which point the server should set cookies on the response, and those would be stored in the session for reuse on the post request. Use that mechanism to get your own CSRF cookie value.

Note the Content-Type header:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

When you pass in a dictionary to the data keyword of the requests.post() function, the library will encode the data to exactly that content type for you.

The actual payload is

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

These are two fields, csrf_test_name, and vID, that need to part of your payload dictionary.

Note that the csrf_test_name value matches the csrf_cookie_name value in the cookies. This is how the site protects itself from Cross-site forgery attacks, where a third party may try to post to the same URL on your behalf. Such a third party would not have access to the same cookies so would be prevented. Your code needs to obtain a new cookie; a proper CSRF implementation would limit the time any CSRF cookie can be re-used.

So what would at least be needed to make it all work, is:

# *optional*, the site may not care about these. If they *do* care, then
# they care about keeping out automated scripts and could in future 
# raise the stakes and require more 'browser-like' markers. Ask yourself
# if you want to anger the site owners and get into an arms race.
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
    'X-Requested-With': 'XMLHttpRequest',
}

payload = {
    'vID': 9999,
}

url = 'http://xyz.website.com/ajax-load-system'
# the URL from the Referer header, but others at the site would probably
# also work
initial_url = 'http://xyz.website.com/help-me/ZYc5Yn'

with requests.Session() as session:
    # obtain CSRF cookie
    initial_response  = session.get(initial_url)
    payload['csrf_test_name'] = session.cookies['csrf_cookie_name']

    # Now actually post with the correct CSRF cookie
    response = session.post(url, headers=headers, data=payload)

If this still causes issues, you'll need to try out two additional headers, , Accept and Accept-Language. Take into account this will mean that the site has already thought long and hard about how to keep automated site scrapers out. Consider contacting them and asking if they offer an API option instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • So `headers = {Host: xyz.website.com Accept: application/json, text/javascript, */*; q=0.01 Accept-Language: en-GB,en;q=0.5 .... etc}` and `payload = { "csrf_test_name": "a3f8adecbf11e29c006d9817be96e8d4", "vID": "9999" }` and `r = requests.post(url, headers=headers, data=json.dumps(payload))` But this didn't work too :( – Gaurav Khe Jul 01 '18 at 15:29
  • @GauravKhe: do be clear about *didn't work*. And I already told you the site **doesn't use JSON**. And most of those headers are automatic, and taken care of by requests. – Martijn Pieters Jul 01 '18 at 15:31
  • Thanks got it. Thanks you for the wonderful explanation. I was getting confused but this cleared all the concepts :) – Gaurav Khe Jul 01 '18 at 15:35
  • @Martijn Pieters i tried this method and the response comes as 200. i could not get this line and this is not there in my cookies. Can any help be provided. – Marx Babu Dec 08 '18 at 01:50
  • @MarxBabu: not sure what you are asking. What is 'this line' you could not get? What response body *do* you get? Comments are not a great place to ask questions, however, perhaps you should post a new question instead. – Martijn Pieters Dec 09 '18 at 11:11
  • I am getting response code as 200 and using browser post it works fine . I understand this is not the right place to connect however posting another query also not ok as similar topics are here already .How can i connect with you to share the issue to you directly . i was refering to payload['csrf_test_name'] = session.cookies['csrf_cookie_name'] – Marx Babu Dec 10 '18 at 06:25
  • @GauravKhe i have similar issue where the response coming as 200. I did not get you what did you mentioned as "Thanks got it" above .How did you make it to work.Need some support. – Marx Babu Dec 10 '18 at 10:37
  • @MarxBabu sorry, that’s not really an option. Comments here should only be used to help improve questions and answers, and asking follow-up problems is not supported. Your options are to use the chat rooms, or to post a new question. – Martijn Pieters Dec 10 '18 at 14:27
  • @Martijn Pieters sure i try to connect using chat room,checking how to get chat room here. – Marx Babu Dec 11 '18 at 04:29
  • @MartijnPieters here i have created new query ,Can you please check and help https://stackoverflow.com/questions/53721153/python-post-requests-with-header-and-data-returns-error-code-200-request-not-su – Marx Babu Dec 11 '18 at 09:34
  • Can I put a variable here: `nice-var=1;headers = {'Authorization': 'token nice-var'}`? – Timo Jun 09 '21 at 19:05
  • @Timo of course you can, it’s just Python code constructing a dictionary. If you need to interpolate a token value from a variable, use string formatting or concatenation or any other technique to combine a variable with a string literal. – Martijn Pieters Jun 09 '21 at 20:09