1

Problem Statement:

I have a file containing Japanese characters located in my Google Cloud Storage Bucket. While downloading and printing the contents of the same, I find that the Japanese characters turn into ???.

I am specifying UTF-8 as the charset in the code. Please help.

Code:

Blob blob = storage.get(BlobId.of(blobPath, fileName));
LOGGER.info("Blob received with name {}" ,blob.getName());
return new String(blob.getContent(), StandardCharsets.UTF_8);
Michael Piefel
  • 18,660
  • 9
  • 81
  • 112
RRR
  • 31
  • 2
  • Please edit the question to give instructions on how someone can reproduce this. Without seeing the data source, it's not possible to know if perhaps the data is not what you expect. – Doug Stevenson Dec 23 '20 at 02:16
  • Read the definitive guide at https://balusc.omnifaces.org/2009/05/unicode-how-to-get-characters-right.html – Scary Wombat Dec 23 '20 at 02:32
  • Various components neet to be set for utf8; you probably have something set for latin1 (or some other encoding). – Rick James Dec 23 '20 at 04:35
  • @DougStevenson The file in GCS has lines separated by "\n". Each line may or may not contain some Japanese characters. – RRR Dec 23 '20 at 04:42
  • Right, but we can't see what you're *actually* using here. – Doug Stevenson Dec 23 '20 at 04:54
  • Here is a sample line: 映像 4988111254432 KKIT-4433 Fukushima 50 DVD通常版 佐藤浩市 20201106 3800 2926 23 – RRR Dec 23 '20 at 05:34
  • @RickJames Can you please send some pointers regarding what to be set? – RRR Dec 23 '20 at 07:14
  • This [SO thread](https://stackoverflow.com/questions/7698794/japanese-character-encoding-in-java) appears to be similar to your question. Remove the encoding method and check the output in terms of its Unicode. I am guessing that the Japanese characters are not encoded correctly in the raw text. – Andrew Dec 23 '20 at 14:39
  • In Python you can declare the [encoding on the source](https://stackoverflow.com/questions/6289474/working-with-utf-8-encoding-in-python-source) file like this. And Java you do the same when you compile it '-encoding'. – Andrew Dec 23 '20 at 14:49

0 Answers0