19

I am trying to use the curl function in php to login to a specific page. Please check the code below. I connect with my email and password at banggood.com and then i would like to redirect to another private page but it does not work as expected. I get no errors. I am redirected to this page instead ( https://www.banggood.com/index.php?com=account ) using the code below. After i login i want to access a private page where my orders exist. Any help appreciated.

//The username or email address of the account.
define('EMAIL', 'aaa@gmail.com');

//The password of the account.
define('PASSWORD', 'mypassword');

//Set a user agent. This basically tells the server that we are using Chrome ;)
define('USER_AGENT', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36');

//Where our cookie information will be stored (needed for authentication).
define('COOKIE_FILE', 'cookie.txt');

//URL of the login form.
define('LOGIN_FORM_URL', 'https://www.banggood.com/login.html');

//Login action URL. Sometimes, this is the same URL as the login form.
define('LOGIN_ACTION_URL', 'https://www.banggood.com/login.html');


//An associative array that represents the required form fields.
//You will need to change the keys / index names to match the name of the form
//fields.
$postValues = array(
    'email' => EMAIL,
    'password' => PASSWORD
);

//Initiate cURL.
$curl = curl_init();

//Set the URL that we want to send our POST request to. In this
//case, it's the action URL of the login form.
curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL);

//Tell cURL that we want to carry out a POST request.
curl_setopt($curl, CURLOPT_POST, true);

//Set our post fields / date (from the array above).
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues));

//We don't want any HTTPS errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

//Where our cookie details are saved. This is typically required
//for authentication, as the session ID is usually saved in the cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

//Sets the user agent. Some websites will attempt to block bot user agents.
//Hence the reason I gave it a Chrome user agent.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

//Tells cURL to return the output once the request has been executed.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

//Allows us to set the referer header. In this particular case, we are
//fooling the server into thinking that we were referred by the login form.
curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL);

//Do we want to follow any redirects?
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);

//Execute the login request.
curl_exec($curl);

//Check for errors!
if(curl_errno($curl)){
    throw new Exception(curl_error($curl));
}

//We should be logged in by now. Let's attempt to access a password protected page
curl_setopt($curl, CURLOPT_URL, 'https://www.banggood.com/index.php?com=account&t=ordersList');

//Use the same cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

//Use the same user agent, just in case it is used by the server for session validation.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

//We don't want any HTTPS / SSL errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

//Execute the GET request and print out the result.
curl_exec($curl);
stefanosn
  • 3,264
  • 10
  • 53
  • 79

2 Answers2

21

You're doing several things wrong:

  1. You're trying to login before you have a cookie session, but the site requires you to have a cookie session before sending the login request.

  2. There's an CSRF token tied to your cookie session, here called at, that you need to parse out from the login page html and provide with your login request, which your code doesn't fetch.

  3. Most importantly, there is a captcha image tied to your cookie session that you need to fetch and solve, and who's text you need to append to your login request, which your code is completely ignoring.

  4. Your login request needs the header x-requested-with: XMLHttpRequest - but your code isn't adding that header.

  5. Your login request needs the fields com=account and t=submitLogin fields in the POST data, but your code isn't adding either of them (you try to add them to your URL, but they're not supposed to be in the url, they're supposed to be in the POST data, aka your $postValues array, not the url)

Here's what you need to do:

  • First do a normal GET request to the login page. This will give you a session cookie id, the CSRF token, and the url to your captcha image.
  • Store the cookie id and make sure to provide it with all further requests, then parse out the csrf token (it's in the html looking like <input type="hidden" name="at" value="5aabxxx5dcac0" />), and the url for the captcha image (its different for each cookie session, so don't hardcode it).
  • Then fetch the captcha image, solve it, and add them all to your login request's POST data, along with the username, password, captcha answer, com and t, and add the http header x-requested-with: XMLHttpRequest to the login request too, send it to https://www.banggood.com/login.html, then you should be logged in!

Here's an example implementation using hhb_curl for the web requests (it's a curl_ wrapper taking care of cookies, turning silent curl_ errors into RuntimeExceptions, etc), DOMDocument for parsing out the CSRF token, and deathbycaptcha.com's api for breaking the captcha.

Ps: the example code won't work until you provide a real credited deathbycaptcha.com api username/password on line 6 and 7, also the captcha looks so simple that I think breaking it could be automated if you're sufficiently motivated, I'm not. - edit, seems they improved their captcha since i wrote that, it looks very difficult now. Also, the banggood account is just a temporary testing account, no harm comes of it being compromised, which obviously happens when I post the username/password here)

<?php

declare(strict_types = 1);
require_once ('hhb_.inc.php');
$banggood_username = 'igcpilojhkfhtdz@my10minutemail.com';
$banggood_password = 'igcpilojhkfhtdz@my10minutemail.com';
$deathbycaptcha_username = '?';
$deathbycaptcha_password = '?';

$hc = new hhb_curl ( '', true );
$html = $hc->exec ( 'https://www.banggood.com/login.html' )->getStdOut ();
$domd = @DOMDocument::loadHTML ( $html );
$xp = new DOMXPath ( $domd );
$csrf_token = $xp->query ( '//input[@name="at"]' )->item ( 0 )->getAttribute ( "value" );
$captcha_image_url = 'https://www.banggood.com/' . $domd->getElementById ( "get_login_image" )->getAttribute ( "src" );
$captcha_image = $hc->exec ( $captcha_image_url )->getStdOut ();

$captcha_answer = deathbycaptcha ( $captcha_image, $deathbycaptcha_username, $deathbycaptcha_password );

$html = $hc->setopt_array ( array (
        CURLOPT_POST => 1,
        CURLOPT_POSTFIELDS => http_build_query ( array (
                'com' => 'account',
                't' => 'submitlogin',
                'email' => $banggood_username,
                'pwd' => $banggood_password,
                'at' => $csrf_token,
                'login_image_code' => $captcha_answer 
        ) ),
        CURLOPT_HTTPHEADER => array (
                'x-requested-with: XMLHttpRequest' 
        ) 
) )->exec ()->getStdOut ();
var_dump ( // $hc->getStdErr (),
$html );

function deathbycaptcha(string $imageBinary, string $apiUsername, string $apiPassword): string {
    $hc = new hhb_curl ( '', true );
    $response = $hc->setopt_array ( array (
            CURLOPT_URL => 'http://api.dbcapi.me/api/captcha',
            CURLOPT_POST => 1,
            CURLOPT_HTTPHEADER => array (
                    'Accept: application/json' 
            ),
            CURLOPT_POSTFIELDS => array (
                    'username' => $apiUsername,
                    'password' => $apiPassword,
                    'captchafile' => 'base64:' . base64_encode ( $imageBinary )  // use base64 because CURLFile requires a file, and i cba with tmpfile() .. but it would save bandwidth.
            ),
            CURLOPT_FOLLOWLOCATION => 0 
    ) )->exec ()->getStdOut ();
    $response_code = $hc->getinfo ( CURLINFO_HTTP_CODE );
    if ($response_code !== 303) {
        // some error
        $err = "DeathByCaptcha api retuned \"$response_code\", expected 303, ";
        switch ($response_code) {
            case 403 :
                $err .= " the api username/password was rejected";
                break;
            case 400 :
                $err .= " we sent an invalid request to the api (maybe the API specs has been updated?)";
                break;
            case 500 :
                $err .= " the api had an internal server error";
                break;
            case 503 :
                $err .= " api is temorarily unreachable, try again later";
                break;
            default :
                {
                    $err .= " unknown error";
                    break;
                }
        }
        $err .= ' - ' . $response;
        throw new \RuntimeException ( $err );
    }
    $response = json_decode ( $response, true );
    if (! empty ( $response ['text'] ) && $response ['text'] !== '?') {
        return $response ['text']; // sometimes the answer might be available right away.
    }
    $id = $response ['captcha'];
    $url = 'http://api.dbcapi.me/api/captcha/' . urlencode ( $id );
    while ( true ) {
        sleep ( 10 ); // check every 10 seconds
        $response = $hc->setopt ( CURLOPT_HTTPHEADER, array (
                'Accept: application/json' 
        ) )->exec ( $url )->getStdOut ();
        $response = json_decode ( $response, true );
        if (! empty ( $response ['text'] ) && $response ['text'] !== '?') {
            return $response ['text'];
        }
    }
}
hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • Thanks for your answer. So i have to do this in php 7? Because in php5 i get lots of errors... – stefanosn Mar 20 '18 at 23:44
  • 1
    @stefanosn no you don't, PHP5 is fully capable of doing this, but i just happened to write this in php7, because that's easier, and i only use php7 myself these days (for example, in PHP5 `function f($str){if(!is_string($str)){throw new InvalidArgumentException('argument 1 must be a string, but '.gettype($str).' given!');}}` - and the equivalent PHP7 code is `function f(string $str){}` - scalar type input validation is much easier in php7) - but there is a php5 version of hhb_curl here https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php5.php - but it's mostly unmaintained – hanshenrik Mar 21 '18 at 00:09
  • If i login manually using the browser and then run the script without using the deathbycaptcha function and use php5 hhb_curl will this work? also i get always this warning Warning: Unsupported declare 'strict_types' line 2 – stefanosn Mar 21 '18 at 00:42
  • I get this error also Catchable fatal error: Argument 1 passed to deathbycaptcha() must be an instance of string, string given, called in /Users/stefanos/Sites/... on line 17 and defined in /Users/Sites/... on line 37 – stefanosn Mar 21 '18 at 01:08
  • I removed string in ( ) function and it worked but i get insufficient-funds on deathbycaptcha so if i do not want to pay money for the captcha if i login from the browser and then use the code will it work? If yes what changes do i need to make apart from removing the deathbycaptcha function...? – stefanosn Mar 21 '18 at 01:13
  • and if i make the changes to login how do i go to the webpage https://www.banggood.com/index.php?com=account&t=ordersList to see my orderslist – stefanosn Mar 21 '18 at 01:20
  • 1
    @stefanosn strict_types is only supported in PHP7+, so remove if if you're on PHP5. you can skip the login phase entirely by logging in with your browser, and copy your browser's cookies to PHP using CURLOPT_COOKIE or CURLOPT_COOKIEFILE. and i don't think you need to use deathbycaptcha, banggood's captcha looks so simple that i think PWNtcha could break it (a free open-source captcha breaker, see http://caca.zoy.org/wiki/PWNtcha ) - as for checking your order-list post-login, `$html=$hc->exec(' banggood.com/index.php?com=account&t=ordersList')->getStdOut(); echo $html;` – hanshenrik Mar 21 '18 at 08:25
  • So i have to use curl_setopt($ch, CURLOPT_COOKIE, 'tmpfile.tmp'); and curl_setopt($ch, CURLOPT_COOKIEFILE, 'tmpfile.tmp'); to load the cookie from the place it is stored? – stefanosn Mar 22 '18 at 00:40
  • 1
    @stefanosn that's not how you use CURLOPT_COOKIE, but correct about CURLOPT_COOKIEFILE, you can do that. check https://curl.haxx.se/libcurl/c/CURLOPT_COOKIE.html about how to use CURLOPT_COOKIE – hanshenrik Mar 22 '18 at 07:43
  • ok thanks a lot i will try to find out how it works. one last question. In the page i have a dropdown menu with three options. Whenever i select a different option i get the new items loaded on the page but the page source does not change remains the same as i have select the first option (default option) . The url does not change also. Although i see the new items. Here is the code of the page at the specific point where you select the items https://pastebin.com/xB1nu146 - I would like to trigger it in php and get new items loaded by code. any help appreciated. – stefanosn Mar 22 '18 at 16:12
  • 1
    @stefanosn then the items are fetched by javascript. but wrong, the page source DOES change, but you're probably viewing the `View-Source:`-version, that thing doesn't show the current page source, it shows the page source prior to running javascript. any changes made by javascript to the page, is not shown in the `View-Source:`. to view the current javascript-modified page html, use the DOM inspector from the dev tools (most browsers have some version of this, in Chrome and Firefox, for example, you open it by pressing F12). (comment too long, will continue in next comment) – hanshenrik Mar 23 '18 at 07:56
  • 1
    @stefanosn as for triggering those actions in PHP, inspect the network requests made by javascript when you're clicking those buttons, and re-implement them with PHP/Curl. again, most browsers have a "network requests inspection" tool to achieve this. (alternatively, you can use Fiddler Proxy for this.. i use that for analyzing iOS/android apps's network requests, https://www.telerik.com/fiddler ) – hanshenrik Mar 23 '18 at 07:58
  • What can i say. You are one of the best contributors in this community. Thank you so much for your help! – stefanosn Mar 23 '18 at 11:41
  • sorry for the late reply. login works but when i try to get the webpage of my orders after i login in my account using your code it redirects me to my profile automatically ( https://www.banggood.com/index.php?com=account ). i use the code below: $html = $hc->exec ( 'https://www.banggood.com/index.php?com=account&t=ordersList' )->getStdOut (); echo $html; but i get results from this page instead https://www.banggood.com/index.php?com=account any help appreciated – stefanosn Dec 10 '18 at 22:57
  • i have found that this works only $html = $hc->setopt_array ( array ( CURLOPT_POST => 1, CURLOPT_POSTFIELDS => http_build_query ( array ( 'com' => 'account', 't' => 'ordersList', 'email' => $banggood_username, 'pwd' => $banggood_password, 'at' => $csrf_token // 'login_image_code' => $captcha_answer ) ), CURLOPT_HTTPHEADER => array ( 'x-requested-with: XMLHttpRequest' ) ) )->exec ()->getStdOut (); – stefanosn Dec 10 '18 at 23:22
  • i cant find how to navigate to this page actually... https://www.banggood.com/Affiliate-products.html . after i login i try $html = $hc->exec ( 'https://www.banggood.com/Affiliate-products.html' )->getStdOut (); but does not work it redirects me to https://www.banggood.com/Affiliate.html – stefanosn Dec 10 '18 at 23:40
  • 1
    @stefanosn i can't help you with that without a `Banggood Affiliate Program`-member account to test on. what does ->getStdErr() return when you get redirected tho? – hanshenrik Dec 11 '18 at 11:54
  • it worked like this $html = $hc->exec ( 'https://www.banggood.com/Affiliate-products.html' )->getStdOut (); $html = $hc->setopt_array ( array ( CURLOPT_POST => 1, CURLOPT_POSTFIELDS => http_build_query ( array ( ) ), CURLOPT_HTTPHEADER => array ( 'x-requested-with: XMLHttpRequest' ) ) )->exec ()->getStdOut (); if i remove this line CURLOPT_POSTFIELDS => http_build_query ( array ( ) ), does not work for some reason i need to define CURLOPT_POSTFIELDS empty to work . – stefanosn Dec 11 '18 at 14:17
  • i understand its difficult for you because you dont have affiliate program getstderr() gives me this output when i use curlopt_potfields to make it work https://pastebin.com/DsVAxv1a – stefanosn Dec 11 '18 at 14:24
  • but still i need to trigger javascript somehow to change the category and results from 15 to 50 results per page. i dont really know how to do because i cant sent postfields there is not com= or t= parameters...is there anything i can do? – stefanosn Dec 11 '18 at 14:28
  • this is what i get back from the banggood.com/Affiliate-products.html ...check https://pastebin.com/dM5tgAmZ but i cant find out how to get 50 results per page or change category...is there a way to trigger javascript from php to change number of results per page or category? – stefanosn Dec 11 '18 at 22:49
  • 1
    `is there a way to trigger javascript from php to change number of results per page or category? ` - yeah kind of, you have to emulate the javascript code's XMLHttpRequests with PHP/curl. usually you can find the packets to emulate using the Chrom Dev Tools, but sometimes you'll need [Fiddler Proxy](https://www.telerik.com/fiddler) to figure out how to emulate them in PHP. - btw, you could try to [contact me](https://stackoverflow.com/users/1067003/hanshenrik?tab=profile), when i got time, maybe we could figure it out together – hanshenrik Dec 12 '18 at 12:42
  • Hello there hans hope you are doing ok. If you can help me again i would appreciate it. I have sent you message on facebook profile thanks. – stefanosn Jul 20 '19 at 15:01
-2

Set CURLOPT_FOLLOWLOCATION to 1 or true, you may also need CURLOPT_AUTOREFERER instead of the static REFERER.

Do you get some cookies into your COOKIEJAR (cookie.txt) ? Remember that the file must already exists and PHP needs write permissions.

If you have PHP executing on localhost then a Network sniffer tool could help debug the problem, try with Wireshark or some equivalent software. Because maybe the request still miss some important HTTP Headers like Host

GrowingBrick
  • 731
  • 4
  • 12