0

We have a website that allows a user to enter their car registration in a search box. We just got some errors from the website because of some dodgy characters also being paste in to the search box:

vrm=AB55CBX%E2%80%8F

So the number plate is "AB55CBX" but for some reason the following is added to the string "%E2%80%8F"

My code threw this error when trying query the database for that number plate:

Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='' in

What should I be doing to clean this up? I'm already using PDO and binding the value before executing it.

James Wilson
  • 809
  • 3
  • 14
  • 25
  • Those 3 characters are called a *"right-to-left mark (RLM) is a non-printing character used in the computerized typesetting of bi-directional text containing mixed left-to-right scripts"* - Source: https://en.wikipedia.org/wiki/Right-to-left_mark - You need to constrain what you let people enter in the field. See also http://www.w3.org/TR/WCAG20-TECHS/H34.html or Google "right-to-left mark (RLM)" to further this research. – Funk Forty Niner Jul 03 '14 at 12:14
  • Interesting! It's just a text input. Do you mean control it on the front end, back end or both? What do you recommend? Is preg_match an idea or will only allowing letters and numbers still allow the numbers from the RLM to be included? – James Wilson Jul 03 '14 at 12:20
  • Control it from A to Z. You could restrict input to standard letters and numbers using a `preg_match` - regex. Something like `if ( !preg_match('/^[A-Za-z][A-Za-z0-9]/', $var) )` – Funk Forty Niner Jul 03 '14 at 12:21
  • See this Q&A http://stackoverflow.com/q/4345621/ for letters and numbers only match. One example shows `preg_match("/^[a-zA-Z0-9]+$/", $value)` - The RLM shouldn't make its way in there, since it's not part of the standard typeset. – Funk Forty Niner Jul 03 '14 at 12:24
  • You could also use `str_replace('%E2%80%8F', '', $var);` – Funk Forty Niner Jul 03 '14 at 12:33

2 Answers2

0

This looks like an url encoding for the characters, try urldecode() before querying http://www.php.net/manual/en/function.urldecode.php

Toasts
  • 1
  • 1
0

Thanks for your answers. I think preg_replace works so I've used the following:

$vrm = preg_replace('/[^A-Za-z0-9-]/', '', $vrm);
James Wilson
  • 809
  • 3
  • 14
  • 25