0

I would like to use htmlspecialchars to sanitize data before doing a POST request but keep getting the error:

url=*** - Uncaught TypeError: http_build_query(): Argument #1 ($data) must be of type array, string given

This is the function related to this error and how it is getting triggered:

function makePostRequest($baseURL) {
    $ch = curl_init();
    $clean_post =  htmlspecialchars($POST);
    $data = http_build_query($clean_post);
    curl_setopt($ch, CURLOPT_URL, $baseURL);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $response = curl_exec($ch);
    curl_close($ch);

    if($e = curl_error($ch)) {
        echo $e;
    } else {
        $json = json_decode($response, true);
        return print_r($json);
    }
}
...
$response = "";
switch (getRequestMethod()) {
  case 'GET':
    $response = makeGetRequest($baseURL);
    break;
  case 'POST':
    $response = makePostRequest($baseURL);
    break;
  default:
    echo "There has been an error";
    return;
}

This is a sample of the data I am sending as part of the POST request:

    data = {
        name:'***',
        password: '***',
        userID: emailAddress,
        userSecret: password
    }
    console.log('data', data)
    jQuery.ajax({
        type: "POST",
        url: "proxy.php?url=***",
        dataType: "json",
        contentType: 'application/x-www-form-urlencoded',
        data: data,
        success: function (data){
            console.log('success', data)
        }
    });
});
ADyson
  • 57,178
  • 14
  • 51
  • 63
Emm
  • 2,367
  • 3
  • 24
  • 50
  • `$clean_post` is obviously a string – RiggsFolly Feb 23 '23 at 12:39
  • Note: `htmlspecialchars` takes a string as input (NOT AN ARRAY) and returns a string, even if it gets an error, for example by passing it an array not a string. But anyway it make no sense doing thsi anyway – RiggsFolly Feb 23 '23 at 12:40
  • *WHY* do you want to use htmlspecialchars to "sanitize" data before doing a POST request in the first place? It makes no sense – Your Common Sense Feb 23 '23 at 12:40
  • 3
    Your usage of `htmlspecialchars()` is inappropriate and potentially problematic. `htmlspecialchars()` is an _output_ filter, only to be used _specifically_ when _outputting_ data into a HTML document. It is designed only to help protect against XSS. It should not be used at any other time, such as when receiving input data -at worst it can change or corrupt your data unnecessarily in that situation. It also has nothing to do with sending in a HTTP request either. See [when to use htmlspecialchars() function?](https://stackoverflow.com/questions/4882307/when-to-use-htmlspecialchars-function) – ADyson Feb 23 '23 at 12:41
  • @YourCommonSense Wouldn't I need to do that to any data I `POST` to mitigate against XSS attacks? – Emm Feb 23 '23 at 12:42
  • 1
    @Emm no, because XSS occurs in a browser. You're not writing this data to a HTML document which is going to be displayed in a browser. Read my previous comment and look at the link, and make sure you understand some basics about what XSS is and how it occurs. – ADyson Feb 23 '23 at 12:42
  • @ADyson If I am sending data inputted client-side to a third party API, shouldn't I do something to avoid the user injecting HTML or SQL? – Emm Feb 23 '23 at 12:50
  • 2
    @Emm No, because you have no idea whether the 3rd party will try to put any of the data you provide into either a HTML or SQL context. If they don't, there's nothing for anyone to worry about. And if they do, then it's **their** responsibility to deal with the data accordingly at the right moment. They should be treating your application as a potential threat - you're providing input data they don't control, and they don't know how it got there or where it came from. – ADyson Feb 23 '23 at 12:52
  • It's their responsibility, not yours. You will only corrupt the data – Your Common Sense Feb 23 '23 at 12:52
  • 1
    @ADyson your comments are so thorough that I am tempted to reopen this question so you can make them into the answer – Your Common Sense Feb 23 '23 at 12:53
  • 1
    And also as explained already, if you prematurely HTML-encode data which isn't going anywhere near a HTML document then you simply risk corrupting it (e.g. imagine I used the character `<` in my password, which ought to be a perfectly legitimate thing to do...using htmlspecialchars on that would alter it without my knowledge, meaning I don't know my real password anymore, and the alteration wouldn't achieve anything useful). And you physically cannot santisise for SQL injection because that involves writing parameterised queries, and in this case you're not the one writing the query code. – ADyson Feb 23 '23 at 12:54
  • @YourCommonSense Thankyou. if you think that's a good idea then I am happy to write them up – ADyson Feb 23 '23 at 12:55
  • @ADyson Thanks! Appreciate the detail – Emm Feb 23 '23 at 12:55
  • 1
    I actually once created an account with a large commercial organisation which provides services to the general public, and they silently stripped a `#` character from the password I provided at registration, meaning I couldn't log in properly. Knowing what I know about these processes, I eventually guessed what had happened and tried the same password without that character and it logged me in. So people actually do this stuff in real life and it causes real problems - anyone without relevant experience would have no idea why it wasn't working. – ADyson Feb 23 '23 at 12:59
  • 1
    I reported it to their helpdesk and their developers initially had no idea what I was referring too, which was worrying in itself, for an organisation of that size not to have the understanding of what they were doing to the data. Eventually I think they fixed it but it just shows you that we're not merely being pedantic with these remarks. Hopefully it's useful for you. – ADyson Feb 23 '23 at 13:00
  • @ADyson wait, it isn't closed ) So please, write it up – Your Common Sense Feb 23 '23 at 13:08
  • @YourCommonSense was literally doing so as you typed :-). Should be there now. – ADyson Feb 23 '23 at 13:08

1 Answers1

2

The actual error here is because htmlspecialchars returns a string (rather than an array), but http_build_query expects you to give it an array, as the error message points out. However, there's no point trying to fix it directly, because you shouldn't be doing this to begin with.

Your usage of htmlspecialchars() is inappropriate and potentially problematic. htmlspecialchars() is an output filter, only to be used specifically when outputting data into a HTML document. It is designed only to help protect against XSS attacks - which are something that can only occur in a HTML document loaded into a web browser with JavaScript enabled.

It should not be used at any other time, such as when receiving input data -at worst it can change or corrupt your data unnecessarily in that situation. It also has nothing to do with sending in a HTTP request either. See also when to use htmlspecialchars() function?.

You're not writing this data to a HTML document which is going to be displayed in a browser so there is no need to HTML-encode the data, or try to "sanitise" it against anything else (e.g. SQL injection as you mentioned in the comments) that you aren't directly using it for.

You also have no idea whether the 3rd party whose API you are sending it to will try to put any of the data you provide into either a HTML or SQL context, or anything else. If they don't, there's nothing for anyone to worry about. And if they do, then it's their responsibility to deal with the data accordingly at the right moment. They should be treating your application as a potential threat - you're providing input data they don't control, and they don't know how it got there or where it came from, or how you've processed it.

If you prematurely HTML-encode data which isn't going anywhere near a HTML document then you simply risk corrupting it (e.g. imagine I used the character < in my password, which ought to be a perfectly legitimate thing to do...using htmlspecialchars on that would alter it without my knowledge, meaning I don't know my real password anymore, and the alteration wouldn't achieve anything useful). And you physically cannot sanitise for SQL injection here, because that involves writing parameterised queries, and in this case you're not the one writing the query code.


P.S.

I actually once created an account with a large commercial organisation which provides services to the general public, and they silently stripped a # character from the password I provided at registration, meaning I couldn't log in properly. Knowing what I know about these processes, I eventually guessed what had happened and tried the same password without that character and it logged me in. So people actually do this stuff in real life and it causes real problems - anyone without relevant experience would have no idea why it wasn't working.

I reported it to their helpdesk and their developers initially had no idea what I was referring too, which was worrying in itself, for an organisation of that size not to have the understanding of what they were doing to the data. Eventually I think they fixed it, but I leave this anecdote here just to demonstrate that these are real issues, and not just technical pedantry.

ADyson
  • 57,178
  • 14
  • 51
  • 63