0

I was processing some data tweeter using java. I read them from the file, do some process and print to the stdout.
The text in file looks like this:

"RT @Bollogosta319a: #BuyBookSilentSinners \u262fGain Followers\n\u262fRT This\n\u262fMUST FOLLOW ME I FOLLOW BACK\n\u262fFollow everyone who rts\n\u262fGain\n #ANDROID \u2026"

I read it in, and print it out to stdout. The output is supposed to be:

"RT @Bollogosta319a: #BuyBookSilentSinners ☯Gain Followers\n☯RT This\n☯MUST FOLLOW ME I FOLLOW BACK\n☯Follow everyone who rts\n☯Gain\n #ANDROID …"

But my output is like this:

"RT @Bollogosta319a: #BuyBookSilentSinners ?Gain Followers ?RT This ?MUST FOLLOW ME I FOLLOW BACK ?Follow everyone who rts ?Gain #ANDROID ?"

So, it seems that I have two problems to deal with:
1. print the exact Unicode character instead of Unicode string
2. keep "\n" as it is, instead of a newline in the output.

How can I do this? (I'm really crazy about dealing with different coding in Java)

Pranjal
  • 63
  • 2
  • 7
Ziqi Liu
  • 2,931
  • 5
  • 31
  • 64
  • what you tried ? can you post ? – kushagra mittal Oct 25 '16 at 22:31
  • If you want to have unicode (such as ☯) in your out stream you need to ensure the stream is using UTF8. See http://stackoverflow.com/questions/20386335/printing-out-unicode-from-java-code-issue-in-windows-console for how to do this. – Display Name Oct 26 '16 at 00:36
  • Actually, I want to know more about the encoding process behind read and write. As it read text from file, I even don't know how it's represented. If I try to print it out, it will show up as some another encoding representation.... – Ziqi Liu Oct 26 '16 at 03:21

1 Answers1

0

I don't know how you are parsing the file, but the method you are using seems to be interpreting escape codes (like \n and \u262f). To leave instances of \n in the file literally, you could replace \n with \\n prior to using whatever means of interpreting the escape codes. The \\ will be converted to a single \, and the n will be left alone. Have you tried using a plain java.io.FileReader to read the file? That may be simpler.

The Unicode symbols may actually be read correctly; many terminals do not support the full range of Unicode characters and print some symbol in place of those it does not understand. Perhaps your program prints and the terminal simply doesn't know how to render it, so it prints a ? instead.

Daniel O
  • 336
  • 2
  • 3
  • 8