PHP MySQL Negative Wildcard Character List

Question

To secure a HTML GET method form submission so that users cannot use MySQL wildcards to get the whole table values, I was looking for a list of PHP-MySQL wildcard characters.

For example, this GET method URL takes lowrange & highrange values as 1 and 100 respectively, and generates the appropriate results between that range: example.com/form.php?lowrange=1&highrange=100

But my table values may range from -10000 to +10000, & a smart alec may like to get the whole list by changing the URL as example.com/form.php?lowrange=%&highrange=% (or other special characters like *, ?, etc. etc.)

The basic purpose is to not allow anything that can lead to whole db values getting exposed in one shot.

So far, I've found the following characters to be avoided as in the preg_match:

if(preg_match('/^~`!@#$%\^&\*\(\)\?\[\]\_]+$/',$url)) {
  echo "OK";
}
else {
  echo "NOT OK";
}

Any other characters to be included in the list to completely block the possibility of wildcard based querying?

There are string fields & numbers fields. String field have LIKE matching (where field1 like '%GET-FORM-VALUE%'), & nos. fields have equal to and BETWEEN matching (where field2 = $GET-FORM-VALUE, OR where field3 between $GET-FORM-LOVALUE and $GET-FORM-HIVALUE) $in SQL.

Thank you.

@Machavity - Thanks, I am having multiple fields, & each need to be validated to avoid such special chars. There are string fields and nos. fields, which need to be validated - like strings should not have % or ?, or nos should be allowed between particular ranges only. In summary, let's just take it as numbers (with equal to & between mataching) & strings fields with LIKE matching. — levent001, Jan 20 '23 at 13:19
Wildcards only work if you use LIKE comparison in the first place. Makes no sense to use that operator for your min/max ranges to begin with - `>`/`>=` and `<`/`<=` are the operators you want to use for that. And for text fields - well either you look for absolute _strict_ equality, simply with `=`; or you'd need to give us a proper explanation _how exactly_ you want to search in the first place. — CBroe, Jan 20 '23 at 13:21
Have you checked https://stackoverflow.com/q/3683746/1427878 yet? — CBroe, Jan 20 '23 at 13:28
In light of the edits... why wouldn't a prepared statement suffice here? You can use your values safely that way, regardless of what control characters are put in. — Machavity, Jan 20 '23 at 13:32
@Machavity - Sorry, what's a prepared statement? I'm not familiar with that — levent001, Jan 20 '23 at 13:34
Besides all of the rest of this, you might also want to look into pagination and/or just using `LIMIT` — Chris Haas, Jan 20 '23 at 13:37
See https://stackoverflow.com/a/60496/2370483 for details on prepared statements. Basically you build your query with placeholders and send the data separately. The DB engine can't confuse data for instructions that way — Machavity, Jan 20 '23 at 13:39
@ChrisHaas - Thanks, LIMIT is already there. Main purpose of this Q is to find out all the possible wildcard characters which need to be checked against. (I'm getting a feeling that the real purpose of this Q is getting missed here) :-) — levent001, Jan 20 '23 at 13:40
@levent001, the purpose isn't missed. We don't always answer the question literally but instead try to figure out, to a certain degree, the bigger picture for what a person is doing, and answer that. In your case, trying to exclude certain characters from strings in order to avoid SQL issues raises giant red flags for most of us. You are trying to guard against data leakage, but we see this as a giant security nightmare for SQL injection. No matter how smart of a regex you build, you'll never be 100% confident it always works. Use prepared statements. — Chris Haas, Jan 20 '23 at 13:55
The point isn't being missed. You have certain characters you'd like to exclude. The catch is that you're doing this to prevent SQL injection, and we have better tooling to ensure user-supplied values never get confused for SQL instructions (prepared statements). Far too few people know about them (as you yourself demonstrated). In other words, this could be an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), where your desired solution wouldn't fix the overall problem. We're trying to ensure we give you the help you need, not the help you think you need. — Machavity, Jan 20 '23 at 13:55
Understood @ChrisHaas & Machavity. Please allow me a day to go through the prepared statement topic. I think that may be a better & robust implementation (as you rightly pointed out). Thanks again for your vital inputs. (Pls excuse my limited knowledge, me just starting & there's a lot to learn from you all :-) Let me know in the comment if you think this q need to be deleted (want to retain it for a day to go thru the links & inputs you provided). — levent001, Jan 20 '23 at 14:08
It is probably possible to devise a "bad" query with any subset of regexp characters, even a very small subset. Gain control over the query and add a ``LIMIT``. Even then, they can write a crawler to download the rows a few at a time. — Rick James, Jan 21 '23 at 01:01

score 0 · Answer 1 · answered Jan 21 '23 at 06:20

No doubt that Prepared Statements are the best implementation, & MUST be the norm.

But sometimes, one gets into a "tricky scenario" where it may not be possible to implement it. For example, while working on a client project as external vendor, I was required to do similar implementation, but without having access to the code that made the connection (like, execute_query was not possible to implement, as connection to db was differently set in another config file). So I was forced to implement the "sanitization" of incoming form values.

To that, the only way was to check what data type & values were expected, & what wild characters can be used to exploit the submission. If that is the case with you, then the alternate solution for your situation (String LIKE matching) & (numbers EQUAL TO or BETWEEN 2 given numbers) is as follows:

As soon as form is submitted, at backend first thing to do is:

Put a check for alphabets on String, BLOCK percentage sign & underscore.

if (preg_match('/[^A-Za-z]+/', $str) && !(preg_match('/%/',$strfield))) { // all good...proceed to execute the query } else { // error message }
Similarly, put a check for numbers/floats on number fields, like if (preg_match('/[^0-9]+/', $nofield))

Only if above are satisfied, then proceed to connect to database, and run the query. Add more checks on field to prevent other wild-cards, as needed.

Another option I implemented (may not necessarily fit, but mentioning as food for thought): In addition to above checks, first generate a count of records that fit the query. If count is abnormally high, then either throw error asking user to narrow the range by resubmitting, or display a limited records per page making it cumbersome for them to keep clicking.

Again to reiterate, go for Prepared Statements if you can.

PHP MySQL Negative Wildcard Character List

1 Answers1