2

So I am very new to networking and I was using the Python Socket library to connect to a server that is transmitting a stream of location data.

Here is the code used.

import socket

BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((gump.gatech.edu, 756))

try:
    while (1):
        data = s.recv(BUFFER_SIZE).decode('utf-8')
        print(data)
except KeyboardInterrupt:
    s.close()

The issue is that the data arrives in inconsistent forms.

Most of the times it arrives in the correct form like this:

2016-01-21 22:40:07,441,-84.404153,33.778685,5,3

Yet other times it can arrive split up into two lines like so:

2016-01-21

22:40:07,404,-84.396004,33.778085,0,0

The interesting thing is that when I establish a raw connection to the server using Putty I only get the correct form and never the split. So I imagine that there must be something happening that is splitting the message. Or something Putty is doing to always assemble it correctly.

What I need is for the variable data to contain the proper line always. Any idea how to accomplish this?

Brian HK
  • 860
  • 1
  • 8
  • 18
  • 1
    It happens with tcp and sockets. It could in theory arrive one byte at a time. Your code needs to stitch it back together again. As for ways to achieve this, you might incorporate a header into your data packets. If you make the first 2 or 4 bytes the length of the data, it should be simple to reassemble. – Paul Rooney Jan 21 '16 at 22:57
  • 1
    See the answer here for a lengthier description of why this is happening: http://stackoverflow.com/a/1716173/2372812 – Tom Dalton Jan 21 '16 at 23:04

3 Answers3

1

It is best to think of a socket as a continuous stream of data, that may arrive in dribs and drabs, or a flood.

In particular, it is the receivers job to break the data up into the "records" that it should consist of, the socket does not magically know how to do this for you. Here the records are lines, so you must read the data and split into lines yourself.

You cannot guarantee that a single recv will be a single full line. It could be:

  • just part of a line;
  • or several lines;
  • or, most probably, several lines and another part line.

Try something like: (untested)

# we'll use this to collate partial data
data = ""

while 1:
    # receive the next batch of data
    data += s.recv(BUFFER_SIZE).decode('utf-8')

    # split the data into lines
    lines = data.splitlines(keepends=True)

    # the last of these may be a part line
    full_lines, last_line = lines[:-1], lines[-1]

    # print (or do something else!) with the full lines
    for l in full_lines:
        print(l, end="")

    # was the last line received a full line, or just half a line?
    if last_line.endswith("\n"):
        # print it (or do something else!)
        print(last_line, end="")

        # and reset our partial data to nothing
        data = ""
    else:
        # reset our partial data to this part line
        data = last_line
donkopotamus
  • 22,114
  • 2
  • 48
  • 60
  • This is unnecessarily complicated. Simply make the end argument an empty string in the original program and you're good. Also take a look at the argument to endswith; you're missing a character... – SoreDakeNoKoto Jan 21 '16 at 23:56
  • That depends entirely on what you might want to really do with this data ... if instead of just printing it, you want to store it for analysis, or run it through a regex to extract information, then you'll want to collate the full lines. – donkopotamus Jan 22 '16 at 00:04
  • Right...i suppose the question's last line confused things...but from his previous statements and his code, i think all he wants to do is print the response. And even if he wanted to save the lines, it would be far more efficient to use a list, since the cost of append() is amortized, to collect chunks and finally call join() on the list. String concatenation, especially when you could get a large response, would be horribly inefficient. – SoreDakeNoKoto Jan 22 '16 at 00:14
  • This is the best solution from what i need! This is because I needed to insert it into a database and this is the best solution for that. – Brian HK Jan 22 '16 at 00:15
1

The easiest way to fix your code is to print the received data without adding a new line, which the print statement (Python 2) and the print() function (Python 3) do by default. Like this:

Python 2:

print data,

Python 3:

print(data, end='')

Now print will not add its own new line character to the end of each printed value and only the new lines present in the received data will be printed. The result is that each line is printed without being split based on the amount of data received by each `socket.recv(). For example:

from __future__ import print_function
import socket

s = socket.socket()
s.connect(('gump.gatech.edu', 756))

while True:
    data = s.recv(3).decode('utf8')
    if not data:
        break    # socket closed, all data read
    print(data, end='')

Here I have used a very small buffer size of 3 which helps to highlight the problem.

Note that this only fixes the problem from the POV of printing the data. If you wanted to process the data line-by-line then you would need to do your own buffering of the incoming data, and process the line when you receive a new line or the socket is closed.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • This is perfect for printing but I do indeed need to couple it. I am now thinking I will just log it and be done with it from this end. And then work with the logfile. – Brian HK Jan 22 '16 at 00:10
-2

Edit: socket.recv() is blocking and like the others said, you wont get an exact line each time you call the method. So as a result, the socket is waiting for data, gets what it can get and then returns. When you print this, because of pythons default end argument, you may get more newlines than you expected. So to get the raw stuff from your server, use this:

import socket 
BUFFER_SIZE = 1024 
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('gump.gatech.edu', 756)) 
try: 
    while (1):   
        data=s.recv(BUFFER_SIZE).decode('utf-8')
        if not data: break
        print(data, end="") 
except KeyboardInterrupt: 
    s.close()
SoreDakeNoKoto
  • 1,175
  • 1
  • 9
  • 16
  • Contrary to this answer, `socket.recv` **is blocking**, unless the socket has been explicitly set to non-blocking, or is closed, or there is a signal interrupt. Also, this answer does not address partial lines being received. – donkopotamus Jan 21 '16 at 23:17
  • Wrong. a) `socket.recv` _is_ blocking by default, b) `socket.recv` returns empty string when the connection is closed. c) an empty string _is_ empty. – mhawke Jan 21 '16 at 23:20
  • @donkopotamus Right. Its blocking and i'll be deleting this. However if it were non-blocking, my answer does explain why there are partial lines. – SoreDakeNoKoto Jan 21 '16 at 23:30
  • @mhawke what i said is, if you print an empty string with the print function, what you get is a newline character and not the 'nothing' you expected – SoreDakeNoKoto Jan 21 '16 at 23:33
  • @TisteAndii: you've fixed my objections, but you've added an infinite loop as a new problem. – mhawke Jan 21 '16 at 23:50
  • Yeah. Forgot about that. – SoreDakeNoKoto Jan 21 '16 at 23:51