0

Hello sorry if this post is silly but i need to know if i got a String in Java like this.

final String string = "myNastyString";
for(int i=0;i<string.length();i++){
    System.out.println((int)string.charAt(i));
}

i want to know the int value of the char or the char itself how many bytes would use in a MySQL.

Please be kind thanks a lot. And yep i have made a few research.

Something like this.

51 3 would use X bytes in a mysqlTable{X}
32   would use X bytes in a mysqlTable{X}
67 C would use X bytes in a mysqlTable{X}
100 d would use X bytes in a mysqlTable{X}
115 s would use X bytes in a mysqlTable{X}
32   would use X bytes in a mysqlTable{X}
70 F would use X bytes in a mysqlTable{X}
114 r would use X bytes in a mysqlTable{X}
233 é would use X bytes in a mysqlTable{X}
65533 � would use X bytes in a mysqlTable{X}
68 D would use X bytes in a mysqlTable{X}
233 é would use X bytes in a mysqlTable{X}
65533 � would use X bytes in a mysqlTable{X}
82 R would use X bytes in a mysqlTable{X}
105 i would use X bytes in a mysqlTable{X}
99 c would use X bytes in a mysqlTable{X}
32   would use X bytes in a mysqlTable{X}
67 C would use X bytes in a mysqlTable{X}
104 h would use X bytes in a mysqlTable{X}
111 o would use X bytes in a mysqlTable{X}
112 p would use X bytes in a mysqlTable{X}
105 i would use X bytes in a mysqlTable{X}
110 n would use X bytes in a mysqlTable{X}
32   would use X bytes in a mysqlTable{X}
40 ( would use X bytes in a mysqlTable{X}
77 M would use X bytes in a mysqlTable{X}
97 a would use X bytes in a mysqlTable{X}
115 s would use X bytes in a mysqlTable{X}
116 t would use X bytes in a mysqlTable{X}
101 e would use X bytes in a mysqlTable{X}
114 r would use X bytes in a mysqlTable{X}
112 p would use X bytes in a mysqlTable{X}
105 i would use X bytes in a mysqlTable{X}
101 e would use X bytes in a mysqlTable{X}
99 c would use X bytes in a mysqlTable{X}
101 e would use X bytes in a mysqlTable{X}
115 s would use X bytes in a mysqlTable{X}
41 ) would use X bytes in a mysqlTable{X}

I mean each value aka char how many bytes will use in mysql because i am using latin1_swedish_ci collation i need to make a validation in case that any character would not fit in my table

I want to know when a char inside myString would consume more than 1 byte in a MYSQl table

chiperortiz
  • 4,751
  • 9
  • 45
  • 79

3 Answers3

3

I mean each value aka char how many bytes will use in mysql because i am using latin1_swedish_ci collation i need to make a validation in case that any character would not fit in my table

MySQL "latin1" is a modified version of windows-1252, meaning it includes all the characters in windows-1252, and also defines mappings for the few characters windows-1252 leaves undefined:

For the “undefined” entries in cp1252, MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.

I don't expect Java to have direct support for "MySQL latin1" because it's not a standard character set. So for each character you can check if it's

  • in the range U+0000 - U+007F (ASCII)
  • in the range U+00A0 - U+00FF (where cp1252 and ISO latin-1 coincide)
  • one of the characters windows-1252 maps to the range 0x80 - 0x9F (see the wikipedia page)
  • U+0081, U+008D, U+008F, U+0090, or U+009D
Joni
  • 108,737
  • 14
  • 143
  • 193
  • Hi Joni thanks mate can you give a example of any character that would use more than 1 byte in a latin1 – chiperortiz Jan 08 '19 at 15:12
  • 1
    There is no such character, Latin 1 is a single byte encoding. All 256 characters it supports are encoded with one byte – Joni Jan 08 '19 at 15:15
  • ISO-8859-1, Latin-1, as it has only 256 values: java will convert other Unicode characters into `?`, a placeholder. Often the placeholder will be `�` as above. Maybe usefull: `string.getBytes("Windows-1252")`- for those ?s. – Joop Eggen Jan 09 '19 at 08:26
0

The number of bytes per char in your DB does not depend on how your String stored in java or any other client that writes to your DB. It depends on character set defined for you DB, your table or specific column. Once the String gets received by DB it is converted into the DB/Table/column defined charset. So just to answer your question: charset latin1 always holds 1 byte per character. BTW latin1 is better known as ISO-8859-1 and is definitely a very standard charset and definitely supported by java. See info on charsets here.

Also I would recommend to switch to a Unicode charset which supports all chars in all languages. Common ones are UTF-8 (may allocate different number of bytes per char (1 - 3 if I remember correctly) or UTF-16 (always 2 bytes per char).

On the side of Java to analyze your strings and diagnose some charset related problems I would suggest open source library MgntUtils (written by me) that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:

String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);

For example a String "Hello World" will be converted into

"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"

It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36
-1

From MySQL Referenze:

https://dev.mysql.com/doc/refman/8.0/en/char.html

If you declare as "CHAR" use the same number regardless of how many characters the string is long. Instead you use "VARCHAR" which depends on the specified length.

for example the string :

  • "myNastyString" in char(13) use 13 byte
  • "myNastyString" in char(20) use 20 byte
  • "myNastyString" in varchar(13) use 13 byte
  • "myNastyString" in varchar(20) use 13 byte
Matteo Tomai
  • 174
  • 9
  • 1
    hi matteo but i want to know how many bytes each char would consume i mean m would use 1 y would use also 1 but i want to know when a char inside myString would consume more than 1 byte – chiperortiz Jan 08 '19 at 14:29