3

I am reading a CSV file, by using com.opencsv.CSVReader Like below

String[] headers = csvReader.readNext();

Value for header is coming like below screen shot:

enter image description here

What's coder here (Highlighted in yellow)?

Why the value is 1 for first index and 0 for all other?

  • 1.) I need to compare it with another string which is having the same value but coder is 0 for it, so equals method returning false. 2.) Can we controll/change the value of coder or ristrict java use only 0 for it. – Pramendra Raghuwanshi Jul 16 '20 at 06:18

2 Answers2

6

The official response is "none of your business", since it's a private member :P Which means it can very well be implementation-specific and not found in other vendors' version of the JVM.

The actual response can be found in the source code for the String class

The identifier of the encoding used to encode the bytes in. The supported values in this implementation are

  LATIN1
  UTF16
 

This field is trusted by the VM, and is a subject to constant folding if String instance is constant. Overwriting this field after construction will cause problems.

As to why the first one is different, that depends on how each String is instantiated. The choice of the default value depends on a parameter set by the JVM. A value different from the default one is a sign that the String was build from another String or a byte array.

In the first case it means the original String has that coder value itself.

In the second case it depends on the result of a call to the decode method of the StringCoding class which returns an object with the code value set depending on that initial parameter I talked about above (the one set by the JVM) and the encoding passed to the constructor of String.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
  • 1
    It doesn't really explain "Why the value is 1 for first index and 0 for all other?", yet I completely agree with the first part. – Amongalen Jul 15 '20 at 14:35
  • No, It is not the expected answer. One more problem here is, equals() method of String uses coder, so it will give unexpected result in this particuller case – Pramendra Raghuwanshi Jul 15 '20 at 15:02
  • 1
    @PramendraRaghuwanshi I added some more detail, I'm not sure it's more helpful now, but I hope it's clearer. – Federico klez Culloca Jul 15 '20 at 15:28
  • 1
    @PramendraRaghuwanshi: What "particular case"? If you can provide an example where two strings should be equal but are comparing as unequal (or vice versa), please give details of that in the question. – Jon Skeet Jul 15 '20 at 15:32
  • Thank you guys for all your effert and time, I understood how JVM is working for it but still the values for coder should not be different. and if it is happened, then is there any way to make the coder value same for all index – Pramendra Raghuwanshi Jul 16 '20 at 06:23
  • 1
    @PramendraRaghuwanshi it would help to see the row in your csv that's causing that result. Like a direct copy and paste in the question, without modifications. – Federico klez Culloca Jul 16 '20 at 06:42
  • @FedericoklezCulloca header1,header2,header3,header4,header5 – Pramendra Raghuwanshi Jul 16 '20 at 06:55
  • 2
    @PramendraRaghuwanshi if that's a direct copy and paste I see no apparent reason for it to behave like that. Try to open the file with a hex editor and check if you see some unexpected byte around or inside the header – Federico klez Culloca Jul 16 '20 at 07:04
  • 3
    @PramendraRaghuwanshi you are focusing on an unimportant implementation detail. Apparently, you have two strings that you think should be equal while they aren’t. Don’t look at the internals, look at the *content*, e.g. char by char. Where is the difference. It shouldn’t be too hard, the first string has a length of eight while all other have a length of seven. Does you csv file have a byte-order-mark at the beginning that is now part of the string? That would explain everything, the different length, the different coder (U+FEFF requires UTF-16 but might be invisible), your confusion… – Holger Jul 16 '20 at 07:58
  • Yes,.. @FedericoklezCulloca, you are right, The file is having some junk characters... I will attach the screen shot in the next answer, to help others if anybody face the same issue.... – Pramendra Raghuwanshi Jul 16 '20 at 08:40
3

As Federico klez Culloca explain How JVM works with String coder, which is absolutly correct.

Java String class is having 2 values for Coder as below. default value is LATIN1 = 0

@Native static final byte LATIN1 = 0;
@Native static final byte UTF16  = 1;

In my case, there are some junk characters (byte order mark in a UTF-8 file) in starting and it got added in header1 and JVM mark it as UTF16 so the value of coder became 1 for it. You can see it in the below screen shot.

enter image description here

If you face the same kind of issue, you can open your file in hex editor and see the hidden charaters.

https://hexed.it/

  • 4
    It’s not just “some junk characters”, it’s a [byte order mark in a UTF-8 file](https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8), just as I suspected in [this comment](https://stackoverflow.com/questions/62917183/what-is-coder-in-string-value#comment111285120_62917302). Some tools, especially on Windows, use this, whereas straight-forward charset decoders do not filter it. – Holger Jul 16 '20 at 08:54