how to convert string to byteArray in UTF-8 Without Bom

Question

I converted text to Base64 byteArray without any problem. Unfortunately, the converted string needs to start with "PD". It means i should encode it to UTF-8 without BOM not with BOM. I started several codes and everything on the net. But, I could not succeed. Any help is appreciated.

Thank you so much.

Regards Alper

public static byte[] convertToByteArray(String strToBeConverted) {
    return strToBeConverted.getBytes(StandardCharsets.UTF_8);
}

http://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java maybe — , Aug 01 '16 at 11:02
The UTF-8 BOM is two bytes, always, at the beginning of the data. So you could just chop those off / skip over them when using the converted data. — T.J. Crowder, Aug 01 '16 at 11:02

score 1 · Accepted Answer · answered Aug 01 '16 at 11:22

1

return strToBeConverted.replaceFirst("^\uFEFF", "").getBytes(StandardCharsets.UTF_8);

The BOM is Unicode code point U+FEFF.

Removing it would mean to check first whether it indeed is present. String.replaceFirst is costly, as it uses regular expression matching, but fine here.

answered Aug 01 '16 at 11:22

Joop Eggen

107,315
7
83
138

I fixed it ... Thank you Joop. The original file was wrong. I fixed it and run your code now i have a UTF8 without bom file. Cheers – Tonyukuk Aug 01 '16 at 12:48
2

Like you said, `replaceFirst()` is costly, and unnecessary. It would be simpler to just check if the first codepoint in the string is a BOM and if so then skip it, eg: `if ((strToBeConverted.length() > 0) && (strToBeConverted.codePointAt(0) == 0xFEFF)) strToBeConverted = strToBeConverted().substring(1); return strToBeConverted.getBytes(StandardCharsets.UTF_8);` – Remy Lebeau Aug 02 '16 at 23:30
@RemyLebeau thanks for the code; `charAt` would be possible too, but nowadays code points are the more logical choice. Note (for readers): substring does not make a copy of the char array content, so is fast and not expensive. – Joop Eggen Aug 03 '16 at 06:10

how to convert string to byteArray in UTF-8 Without Bom

1 Answers1