stdin piping from python to java utf8 encoding error

Question

I am trying to pipe some Unicode characters from Python to Java.

Python code:

thai = u"ฉันจะกลับบ้านในคืนนี้" 
command = "java - jar tokenizer.jar " + thai
p = subprocess.Popen(command, stdout = subprocess.PIPE, stdin = subprocess.PIPE, stderr = subprocess.PIPE)

I plan to pipe them into Java via args[].

The results of the tokenizer was different when I ran it in Java like this:

public static void main(String[] args)
{
    String thai = "ฉันจะกลับบ้านในคืนนี้"
    ThaiAnalyzer ana = new ThaiAnalyzer();
    ana.analyze(thai)
}

vs

public static void main(String[] args)
{
    String thai;
    thai = args[0] // "ฉันจะกลับบ้านในคืนนี้"(this string should be passed from python)
    ThaiAnalyzer ana = new ThaiAnalyzer();
    ana.analyze(args[0])
}

I believe it to be an encoding issue.

Pardon my short Java code as I do not have the code now with me.

What am i trying to say is for example if i were to pipe it from python to java to tokenize this string

"Hi i am going home"

I might end up with

"Hi", "i", "am", "going", "home"

if i were to use the former method

and the latter method might yield something like

"Hi i", "am", "going home"

My question is due to the difference in the results of the output. I am using english to illustrate my problem.

You know that your second line encodes the string in UTF-8, then doesn't do anything with the result, and your third line uses the un-encoded string? — user253751, Mar 17 '15 at 07:54
I don't know if this is the *only* issue here (especially in terms of whether Java will choose the right encoding to decode with), but from the Python side this is actually the same problem as [Why doesn't calling a python string method do anything unless you assign its output?](http://stackoverflow.com/questions/9189172/why-doesnt-calling-a-python-string-method-do-anything-unless-you-assign-its-out) (that question is about the `replace` method, but `encode` behaves the same way for the same reason). — lvc, Mar 17 '15 at 07:58
You've not described your problem very well and your code is currently too short to be helpful. Please [edit] your question when you have access to your code and show us something reproducible. — Duncan Jones, Mar 17 '15 at 08:15
@aceminer first, you don't do any piping there, you are just calling java program with arguments from python script. second, can you show what are you getting in your args on java side? — user3012759, Mar 17 '15 at 11:54

stdin piping from python to java utf8 encoding error

0 Answers0