3

Hi I have developed a windows application which allows user to save data and see data in tamil font. I installed 'Bamini' font (Tamil font), and set textboxs and datagridview to Bamini font. I am able to save and retrieve data in tamil.

The problem is the tamil data i enter is encoded and saved in database for example: if i enter 'இந்தியா' in textbox and save, it is saved as ",e;j_ah" in mysql db (i have set the column character set as utf8). Due to this when i get the data and try to print it, it is printing ",e;j_ah" instead of 'இந்தியா'.

Can anyone let me know what i am doing wrong here?!

Code that i am using to insert the string:

textBox1 values is 'இந்தியா' (since textbox font is set to 'Bamini' tamil font)

     string insertdata = "INSERT INTO tamil (country) VALUES (@cnt)";
     MySqlCommand cmd = new MySqlCommand(insertdata,connection);
     connection.Open();
     cmd.Parameters.AddWithValue("@cnt",textBox1.Text);
     cmd.ExecuteNonQuery();
     connection.Close();

Database affected as follows:

      tablename: Tamil
      Sno   Country
      1     ,e;j_ah

Table Structure:

          CREATE TABLE `tamil` (                              
            `sno` int(11) auto_increment NOT NULL,                        
            `Description` varchar(50) NOT NULL,                          
            `Country` varchar(50) character set utf8 NOT NULL,                         
            KEY `id_sno` (`sno)                             
          ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ; 
Aesha
  • 71
  • 6
  • Have you debugged and checked that the string you're retrieving from the textbox has the right data? I'd advise logging the UTF-16 value of each character in the string - for example, `foreach (char c in text) { Console.WriteLine(((ushort) c).ToString("x4")); }` – Jon Skeet Aug 06 '17 at 07:11
  • Next, please show us the code you're using to save the data to the database. – Jon Skeet Aug 06 '17 at 07:11
  • You probably have to UTF-8-decode the byte stream in the DB column before using it as Unicode characters. – SBS Aug 06 '17 at 07:17
  • @user6060561: I tried to encode , but still same output in printer. Encoding.UTF8.GetString(Encoding.GetEncoding(1252).GetBytes(countryname)) – Aesha Aug 06 '17 at 07:30
  • On second look, I see that the stored string ",e;j_ah" isn't the UTF-8 representation of your text, therefore it isn't properly UTF-8-decoded. Do you have other column type options in your DB? – SBS Aug 06 '17 at 07:39
  • @Aesha One more idea: Change the column type to a simple byte blob, UTF-8-encode the string yourself, and store the resulting byte array. I've used this trick successfuly with my own mySql projects. – SBS Aug 06 '17 at 07:49
  • @user6060561: other column types? – Aesha Aug 06 '17 at 07:51
  • @Aesha `Country` BLOB NOT NULL, see https://dev.mysql.com/doc/refman/5.7/en/blob.html – SBS Aug 06 '17 at 08:02

5 Answers5

2

Can anyone let me know what i am doing wrong here?!

You are using a visually-encoded font.

In this scheme, you press the comma key on the keyboard, and type a regular character U+002C COMMA ,. The text field is set in a font where the shape of a comma make it look like a Tamil Letter I, but it's still really a comma.

A comma will be stored in the database, and searching tools will match it as a comma; if you pull it back out of the database and display it in the Bamini font then it will look like a Tamil Letter I, but display it in any standard font, like the one you're using to inspect your database, and it will look like a comma.

Visually-encoded fonts are the way we used to cope with language scripts that didn't have a standard encoding, but they should not be used today—chuck Bamini in the bin.

Modern operating systems ship a native Tamil keyboard and font (eg under Windows, Nirmala UI). Using this approach, the user would type into a normal text field (that had no special font set) and get a real Unicode character U+0B87 Tamil Letter I , that should look just the same in the database and behave semantically appropriately.

bobince
  • 528,062
  • 107
  • 651
  • 834
1

After a long list of trials, i finally found an alternative solution to print tamil characters in my printer. Note: Hardware Tech support informed me that many thermal printers wont accept tamil characters that are sent through raw printer helper class.

So i designed a crsytal report and tried printing, which was immediate success. (My printer is 3inch thermal printer)

Aesha
  • 71
  • 6
1

Put something like this in the connection string:

id=my_user;password=my_password;database=some_db123;charset=utf8;

And change Description to CHARACTER SET utf8 (or utf8mb4).

See this for more debugging: http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Something goes wrong with the UTF-8 encoding of the string. ",e;j_ah" for sure isn't the UTF-8 representation of your string. I recommend bypassing the UTF-8 feature of the DB altogether and use a simple BLOB type for your "Country" column, which stores a plain byte array of variable length. Then use the UTF-8 codec of .NET and encode/decode yourself, storing the encoded byte array in the BLOB column.

So change the declaration of "Country" to:

`Country` BLOB NOT NULL,   

Use Encoding.UTF8.GetBytes() and Encoding.UTF8.GetString() to encode/decode your Tamil strings.

SBS
  • 806
  • 5
  • 13
  • Did you replace textBox1.Text with Encoding.UTF8.GetBytes(textBox1.Text) in your call to AddWithValue()? If yes, and it doesn't work, try one of the more type-specific "Add" methods of the parameter collection. e.g. the Add() method with parameter name and SqlDbType. – SBS Aug 06 '17 at 09:04
0

Basically, Bamini is not unicode standard. It has it own encodings so whenever you read you need to decode it which means you need to set bamini font on the contents. when you try to print the system doesn't set to bamini font.

so solution should be either use unicode fonts instead of bamini or set bamini font while printing.

Neechalkaran
  • 413
  • 4
  • 6