1

In VARCHAR and TEXT MySQL database datatypes, I keep seeing  characters appear before every single space after the content is inserted or updated from HTML TEXTAREA fields.

When I vardump the PHP data prior to an insert/update, the  characters aren't there.

I tried converting the database and tables from the default collation of latin1_swedish_ci to utf8_general_ci encoding, then inserting/updating the data again, but the  characters still appeared in the text before each space.

I honestly don't have a very good grasp on collation and character encoding and thought things would be fine when left as default, but then I encountered this issue. How can I prevent these extra characters from appearing?

[edit]: If I update text to the database the first time, the extra characters do not appear. If I load the text from the database field and then update it a second time, then the  characters appear.

Force Flow
  • 714
  • 2
  • 14
  • 34
  • You are not viewing this text as UTF-8. ISO-8859-1 maybe? – AbraCadaver Mar 21 '14 at 17:51
  • collation has nothing to do with this. Collation is basically "alphabetical order" and determines if for example the german ringel-s is sorted after a B or after the S. – Tularis Mar 21 '14 at 17:51
  • UTF-8 *all the way* from A-Z – Funk Forty Niner Mar 21 '14 at 17:52
  • The thing you should check for is whether the data you receive is in the encoding you expect (check the browser encoding when posting the form), and check whether the data returned is viewed by the browser as being in UTF-8 (again: browser encoding) – Tularis Mar 21 '14 at 17:52
  • @Tularis, there is a default character set for each collation and many people use PHPMyAdmin to set the character set by setting the collation. – Marcus Adams Mar 21 '14 at 17:54
  • possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Marcus Adams Mar 21 '14 at 17:54
  • @MarcusAdams true, multiple collations (as in 1 or more) exist per character set. – Tularis Mar 21 '14 at 17:58
  • The PHP function `mb_detect_encoding()` returns UTF-8 for the data I'm trying to insert. There is no charset in the PDO database string. – Force Flow Mar 21 '14 at 18:52
  • On second thought--I entered a string that was 160 characters. the `strlen()` function said it was 181 characters. So maybe these characters actually are on the HTML or PHP side, rather than getting added upon an insert/update. – Force Flow Mar 21 '14 at 19:10

3 Answers3

1

Try to execute SET NAMES 'utf8' on database init. Also check the encoding of your PHP files.

va5ja
  • 473
  • 1
  • 3
  • 8
  • I did some quick reading, and there seemed to be a lot of posts about specifically *not* to use `SET NAMES`. What does it actually do? PHP files are cp1252 by default. – Force Flow Mar 21 '14 at 18:54
  • Ha-Ha! Looks like I figured out what the problem was. The page with the form fields was cp1252, but the library PHP files were UTF-8. When I used a function from that library, the extra characters seemed to have found their way into the string. – Force Flow Mar 21 '14 at 19:16
  • Then again, maybe not. I'm still seeing the characters. Sigh. – Force Flow Mar 21 '14 at 19:44
1

I think you might have some hidden non-ascii in the text/html. Perhaps you should try using

$text= str_replace('`', "'", $text);
$text= preg_replace("/[^(\x20-\x7F)]*/",' ', $text);

to strip non-ascii characters before you insert it into the database.

  • This doesn't strip "all non-ascii text", this *only* strips `\xA0`, which is a single character. Many billions of other non-ascii characters exist (and are often used!) – Tularis Mar 21 '14 at 17:59
  • But there are no special characters on the PHP side of things as far as I can tell. When I did try this, it stripped the spaces (and in turn, stripped the  characters). I'm not sure why this approach would suddenly be necessary. I *don't* want to strip punctuation, keyboard symbols, or spaces. – Force Flow Mar 21 '14 at 18:58
  • Things like this usually happen when you copy contents from text editors like MS Word and the likes or from a webpage that has such characters into a text field – Christian Ezeani Mar 21 '14 at 19:18
  • I'm not copying content from anywhere. I'm simply typing in a TEXTAREA field. – Force Flow Mar 21 '14 at 19:45
1

It turns out that there was problem with the PHP library I was using to generate the input/textarea fields. For the textarea fields, it was replacing all the spaces in the value with nbsp special characters. So, things went a bit wonky when storing and re-storing that value to the database. I disabled that feature.

Thank you everyone for taking the time to offer your suggestions.

Force Flow
  • 714
  • 2
  • 14
  • 34