0

So while programming sockets using Java and Python, I stumbled upon something weird.

When sending a message using Java to the receiving end of the Python socket, it splits the message into 2 parts, even though this was not intended.

I probably made a mistake somewhere that's causing this problem, but I really don't know what it is.

You can see that Java sends "Test1" in one command and Python only receives parts of that message:

https://i.stack.imgur.com/0827b.png

Pyhton Server Socket Source:

'''
Created on 23 okt. 2014

@author: Rano
'''

#import serial
import socket

HOST = ''
PORT = 1234
running = True;

skt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
skt.bind((HOST, PORT))
skt.listen(1)
conne, addr = skt.accept()

#ser = serial.Serial('/dev/tty.usbmodem411', 9600)

while running == True:
    data = conne.recvall(1024)

    if(data == "quit"):
        running = False
        break

    rawrecvstring = data + ""
    recvstring = rawrecvstring.split("|")
    print(recvstring[0])

#_______________________ABOVE IS RECEIVE_______________UNDER IS SEND_______________________#    

#  sendstring = ser.readline()
#   if sendstring != "":
#       conne.sendall(sendstring)


conne.close()
#ser.close()

And the Java Socket send function:

private String message;
private DataOutputStream out;
private BufferedReader in;
private Socket socket;
private boolean socketOnline;

public SocketModule(String IP, int Port){
    try {
        socket = new Socket(IP, Port);
        out = new DataOutputStream(socket.getOutputStream());
        in = new BufferedReader(new InputStreamReader(socket.getInputStream()));   
    } catch (UnknownHostException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
};

void setMessage(String s){
    try {
        out.writeBytes(s);
        out.flush();
        System.out.println("message '" + s + "' sent!\n");
    } catch (IOException e) {
        e.printStackTrace();
    }
};

Any ideas as to why the message is being split?

Rano V
  • 3
  • 5
  • Java sends "Test1Test1Test1" which flushing in between. Is `awrecvstring.split("|")` looking for a `|` character somehow? There are no such characters sent. Also have a look at http://stackoverflow.com/a/20352105/995891 `DataOutputStream#writeBytes(String)` is rarely a good idea – zapl Oct 29 '14 at 22:46
  • @zapl I was planning on sending a string with the "|" char inbetween the values I needed to know. This way i would be able to split the string and get all the values into the array of recvstring. – Rano V Oct 29 '14 at 22:50

1 Answers1

2

TCP is a stream protocol, not a message protocol.

As far as TCP is concerned, s.send("abd"); s.send("def"); is exactly the same thing as s.send("abcdef"). At the other end of the socket, when you go to receive, it may return as soon as the first send arrives and give you "abc", but it could just as easily return "abcdef", or "a", or "abcd". They're all perfectly legal, and your code has to be able to deal with all of them.

If you want to process entire messages separately, it's up to you to build a protocol that delineates messages—whether that means using some separator that can't appear in the actual data (possibly because, if it does appear in the actual data, you escape it), or length-prefixing each message, or using some self-delineating format like JSON.

It looks like you're part-way to building such a thing, because you've got that split('|') for some reason. But you still need to add the rest of it—loop around receiving bytes, adding them to a buffer, splitting any complete messages off the buffer to process them, and holding any incomplete message at the end for the next loop. And, of course, sending the | separators on the other side.

For example, your Java code can do this:

out.writeBytes(s + "|");

Then, on the Python side:

buf = ""
while True:
    data = conne.recvall(1024)
    if not data:
        # socket closed
        if buf:
            # but we still had a leftover message
            process_message(buf)
        break
    buf += data
    pieces = buf.split("|")
    buf = pieces.pop()
    for piece in pieces:
        process_message(piece)

That process_message function can handle the special "quit" message, print out anything else, whatever you want. (And if it's simple enough, you can inline it into the two places it's called.)

From a comment, it sounds like you wanted to use that | to separate fields within each message, not to separate messages. If so, just pick another character that will never appear in your data and use that in place of | above (and then do the msg.split('|') inside process_message). One really nice option is \n, because then (on the Python side) you can use socket.makefile, which gives you a file-like object that does the buffering for you and just yields lines one by one when you iterate it (or call readline on it, if you prefer).

For more detail on this, see Sockets are byte streams, not message streams.

As a side note, I also removed the running flag, because the only time you're ever going to set it, you're also going to break, so it's not doing any good. (But if you are going to test a flag, just use while running:, not while running == True:.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks for the info and giving an example! I was planning to send multiple values in 1 string with the "|" character inbetween them to seperate these values and put them in the recvstring array. I will be going to sleep now, but tommorow I will give it a try. Thanks again for your time, appreciate it! – Rano V Oct 29 '14 at 22:56
  • I just tried your code and it works flawlessly on my local network. Thanks! – Rano V Oct 30 '14 at 12:03
  • @RanoV: Great! So, do you understand the part about using "two levels of dividers" to separate records, and to separate fields within the records? (If not, it may help to think of CSV files: newlines between the rows, commas between the columns.) – abarnert Oct 30 '14 at 18:24
  • I understand the method, but I don't really know why you would need it. Can't you just split the buffer using .split("|") every loop. Is it performance wise better to use the "two levels of dividers"? I will be running the application on a local network, so I know that the bytestream will reach it destination within milliseconds. Still greatly appreciate your time, thanks! – Rano V Oct 30 '14 at 21:01
  • @RanoV: It's not a performance issue, it's a simplicity (and robustness and future-proofing) issue. Let's say each message is a superhero's name, secret identity, and primary color. If you send each superhero as a separate line, like `Superman|Clark Kent|blue\nIron Man|Tony Stark|gold\n`, each line is a complete superhero to process. If you reuse the same separator, like `Superman|Clark Kent|blue|Iron Man|Tony Stark|gold|`, each field is part of a superhero; you have to then keep track of the state you're in, and accumulate values in some way as you go along. – abarnert Oct 30 '14 at 21:13
  • Thank you for explaining the reason behind it and I guess it would indeed be smarter if I were to implement it. Thanks! – Rano V Oct 30 '14 at 21:19