18

I have a table I need to handle various characters. The characters include Ø, ® etc.

I have set my table to utf-8 as the default collation, all columns use table default, however when I try to insert these characters I get error: Incorrect string value: '\xEF\xBF\xBD' for column 'buyerName' at row 1

My connection string is defined as

string mySqlConn = "server="+server+";user="+username+";database="+database+";port="+port+";password="+password+";charset=utf8;";

I am at a loss as to why I am still seeing errors. Have I missed anything with either the .net connector, or with my MySQL setup?

--Edit--

My (new) C# insert statement looks like:

MySqlCommand insert = new MySqlCommand( "INSERT INTO fulfilled_Shipments_Data " +
     "(amazonOrderId,merchantOrderId,shipmentId,shipmentItemId,"+
     "amazonOrderItemId,merchantOrderItemId,purchaseDate,"+ ...

      VALUES (@amazonOrderId,@merchantOrderId,@shipmentId,@shipmentItemId,"+
      "@amazonOrderItemId,@merchantOrderItemId,@purchaseDate,"+ 
      "paymentsDate,shipmentDate,reportingDate,buyerEmail,buyerName,"+ ...


       insert.Parameters.AddWithValue("@amazonorderId",lines[0]);
       insert.Parameters.AddWithValue("@merchantOrderId",lines[1]); 
       insert.Parameters.AddWithValue("@shipmentId",lines[2]);
       insert.Parameters.AddWithValue("@shipmentItemId",lines[3]);
       insert.Parameters.AddWithValue("@amazonOrderItemId",lines[4]);
       insert.Parameters.AddWithValue("@merchantOrderItemId",lines[5]);
       insert.Parameters.AddWithValue("@purchaseDate",lines[6]);
       insert.Parameters.AddWithValue("@paymentsDate",lines[7]);

 insert.ExecuteNonQuery();

Assuming that this is the correct way to use parametrized statements, it is still giving an error

 "Incorrect string value: '\xEF\xBF\xBD' for column 'buyerName' at row 1"

Any other ideas?

Robert H
  • 11,520
  • 18
  • 68
  • 110
  • 1
    It might be helpful if you post the C# code you use to insert into the table. – Eric Dahlvang Jun 22 '12 at 15:43
  • These three characters look *almost* like the [UTF-8 byte order mark (BOM)](http://en.wikipedia.org/wiki/Byte_Order_Mark). – stakx - no longer contributing Jun 22 '12 at 15:43
  • Added insert statement, unsure how helpful it is because it is a standard insert statement, but if it helps... – Robert H Jun 22 '12 at 16:04
  • @stakx it seems that it may be a BOM, from the http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8 perl programming wikibook: "(U+FFFD encodes to EF BF BD in UTF-8.", why wouldnt the BOM be recognized properly if everything is already encoded in utf8? – Robert H Jun 22 '12 at 16:08
  • hmm interestingly enough from the MySQL 5.5 reference manual: "mysql ignores Unicode byte order mark (BOM) characters at the beginning of input files. Previously, it read them and sent them to the server, resulting in a syntax error. Presence of a BOM does not cause mysql to change its default character set. To do that, invoke mysql with an option such as --default-character-set=utf8.". Would setting my connection string with charset=utf8 accomplish this? or does --default-character-set perform something different? – Robert H Jun 22 '12 at 16:11
  • @stakx - It is the unicode replacement character, and is used to indicate problems when a system is not able to decode a stream of data to a correct symbol. See my answer for more information. – Elian Ebbing Jun 22 '12 at 18:58
  • NEVER, EVER, EVER create a SQL statement like that. That's wide-open to SQL injection. I'm adding this as an answer as that's such a fundamental mistake that you'll probably need to rewrite much of your program. That's not how you supply parameters to a SQL statement and it's not worth anyone answering your question as you should be using parametrized queries which will probably also fix your problem. – mattmanser Jun 22 '12 at 16:14

4 Answers4

35

\xEF\xBF\xBD is the UTF-8 encoding for the unicode character U+FFFD. This is a special character, also known as the "Replacement character". A quote from the wikipedia page about the special unicode characters:

The replacement character � (often a black diamond with a white question mark) is a symbol found in the Unicode standard at codepoint U+FFFD in the Specials table. It is used to indicate problems when a system is not able to decode a stream of data to a correct symbol. It is most commonly seen when a font does not contain a character, but is also seen when the data is invalid and does not match any character:

So it looks like your data source contains corrupted data. It is also possible that you try to read the data using the wrong encoding. Where do the lines come from?

If you can't fix the data, and your input indeed contains invalid characters, you could just remove the replacement characters:

lines[n] = lines[n].Replace("\xFFFD", "");
Elian Ebbing
  • 18,779
  • 5
  • 48
  • 56
  • Information comes from Amazon - so unfortunately I cant change it - looks like replacement it is. Thanks again Elian - once I have confirmed this works I'll mark the question as answered. – Robert H Jun 22 '12 at 19:00
  • It did what I needed it to do (add info to my database for consumption), although I noticed that it will cause some misleading information, for instance if I remove the ø, a customers last name is spelt incorrectly. It seems that there should be a better way. But thanks for providing a solution that will work for now Elian. – Robert H Jun 22 '12 at 19:16
3

Mattmanser is right, never write a sql query by concatenating the parameters directly in the query. An example of parametrized query is:

string lastname = "Doe";
double height = 6.1;
DateTime date = new DateTime(1978,4,18);

var connection = new MySqlConnection(connStr);

try
{
    connection.Open();

    var command = new MySqlCommand(
        "SELECT * FROM tblPerson WHERE LastName = @Name AND Height > @Height AND BirthDate < @BirthDate", connection);

    command.Parameters.AddWithValue("@Name", lastname);
    command.Parameters.AddWithValue("@Height", height);
    command.Parameters.AddWithValue("@Name", birthDate);

    MySqlDataReader reader = command.ExecuteReader();
    ...
}
finally
{
    connection.Close();
}
Elian Ebbing
  • 18,779
  • 5
  • 48
  • 56
  • Thanks for the example - question though - what prevents SQL injection with using parameters if the are just substituted? Also, how would I do an insert using parameters? – Robert H Jun 22 '12 at 16:24
  • 1
    They are not just substituted. The `AddWithValue()` method is strong typed and overloaded. In this case `name` is a string, but it could also be an `int`, `bool` or `DateTime`. This code will make sure that the strings are escaped, and that all other types are correctly formatted. It will therefore prevent SQL injection vulnerabilities. – Elian Ebbing Jun 22 '12 at 16:27
  • Thanks, working on it now - no need to give an example of an insert, I found one I am working off of. – Robert H Jun 22 '12 at 16:33
0

To those who have a similar problem using PHP, try the function utf8_encode($string). It just works!

josliber
  • 43,891
  • 12
  • 98
  • 133
-1

I have this some problem, when my website encoding is utf-u and I tried to send in form CP-1250 string (example taken by listdir dictionaries). I think you must send string encoded like website.

  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/33235067) – Milan Egon Votrubec Nov 26 '22 at 07:27