1

I am trying to understand the UTF-8 standard and based on the description that follows the below image, wikipedia mentions that the first 128 characters (2^7) are reserved for the ASCII characters. sourced from wikipedia

I want to pass a string as a query parameter to a Cosmos db with sql api which has query size limit of 256 KB, and the db threw an exception because the size exceeded the limit.

Furthermore, when i printed the default character set used in my java 8 app with System.defaultCharset() i get UTF-8 as the output - which also happens to be the value of "file.encoding" property.

BUT when i print all properties in my spring boot app, i also get the below:-

sun.jnu.encoding=Cp1252
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle

According to this answer:- What is the default encoding of the JVM? There are three "default" encodings:

file.encoding:
System.getProperty("file.encoding")
java.nio.Charset:
Charset.defaultCharset()
And the encoding of the InputStreamReader:
InputStreamReader.getEncoding()

Other users in the same answer suggest:

java -XshowSettings and It's going to be locale-dependent.

I'm unable to come to a conclusion as to which encoding is being picked up while execution ? Is there a way to check which encoding is in play at runtime ? Do the above mentioned properties influence/override UTF-8 encoding during build/execution ?

user1354825
  • 1,108
  • 4
  • 21
  • 54
  • What is the question here? Yes characters 0-127 only take one byte each in UTF-8. "db threw an exception because the size exceeded the limit" what was the query? Have you asked cosmos db support? – Joni Jul 11 '20 at 01:04
  • So the size of the query that gets constructed from the ReactiveCosmosrepository's findBySomethingIn(List s) methods will be calculated using the UTF-8 encoding ? – user1354825 Jul 11 '20 at 07:52
  • A few points: [1/3] I think you may be confusing/conflating the encoding used for a Java `String` (which is UTF-16) with the encoding used by the JVM. See [this excellent SO answer](https://stackoverflow.com/a/39957184/2985643) for clarification on that. [2/3] Use of the `file.encoding` property is not supported! See [this SO answer](https://stackoverflow.com/a/56046255/2985643) for more infornation.... – skomisa Jul 11 '20 at 20:10
  • ...[3/3] [This old but authoritative reference](https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4163515) states: _"The preferred way to change the default encoding used by the VM and the runtime system is to **change the locale of the underlying platform before starting your Java program**"_. – skomisa Jul 11 '20 at 20:11
  • Your post contains five separate questions. Can you edit it to focus on a single specific issue? It's perfectly fine to post multiple questions, but please don't bundle them all together within a single post. – skomisa Jul 11 '20 at 20:15
  • @skomisa - edited to focus on encoding scheme only – user1354825 Jul 15 '20 at 15:21

0 Answers0