1

I'm working on a server, and all of the data is line based. I want to be able to raise an exception when a line exceeds a given length without reading any more data than I have to. For example, client X sends a line that's 16KB long even though the line-length limit is 1024 bytes. After reading more than 1024 bytes, I want to stop reading additional data, close the socket and raise an exception. I've looked through the docs and some of the source code, and I don't see a way to do this without rewriting the _readline method. Is there an easier way that I'm overlooking?

EDIT: Comments made me realize I need to add more information. I know I could write the logic to do this without much work, but I was hoping to use builtins to take advantage of efficient buffering with memoryview rather than implementing it myself again or going with the naive approach of reading chunks, joing and splitting as needed without a memoryview.

Eric Pruitt
  • 1,825
  • 3
  • 21
  • 34
  • Have you tried reading bytes until you encounter a newline char, and break if you have read too much? – jdi Jul 01 '12 at 04:53
  • You should read using a buffer of 1024 bytes, instead of `readline`, which will just read until a newline character. – Alex W Jul 01 '12 at 04:54
  • @AlexW The problem with that is that I have to manage buffering the data for all the cases when the index of `\n` is < 1023. At that point, I would be rewriting internal buffering logic. That approach will work if I neglect pipelining or efficient buffering, which I may end up doing anyway. – Eric Pruitt Jul 01 '12 at 05:07
  • @jdi That's the naive approach, and it would work, but I was hoping to use the builtin code since efficient buffering with things like memoryview would already be taken care of for me. – Eric Pruitt Jul 01 '12 at 05:08

2 Answers2

2

I don't really like accepting answers that don't really answer the question, so here's the approach I actually ended up taking, and I'll just mark it community wiki or unanswered later if no one has a better solution:

#!/usr/bin/env python3
class TheThing(object):
    def __init__(self, connection, maxlinelen=8192):
        self.connection = connection
        self.lines = self._iterlines()
        self.maxlinelen = maxlinelen

    def _iterlines(self):
        """
        Yield lines from class member socket object.
        """
        buffered = b''
        while True:
            received = self.connection.recv(4096)
            if not received:
                if buffered:
                    raise Exception("Unexpected EOF.")
                yield received
                continue

            elif buffered:
                received = buffered + received

            if b'\n' in received:
                for line in received.splitlines(True):
                    if line.endswith(b'\n'):
                        if len(line) > self.maxlinelen:
                            raise LineTooLong("Line size: %i" % len(line))
                        yield line
                    else:
                        buffered = line
            else:
                buffered += received

            if len(buffered) > self.maxlinelen:
                raise LineTooLong("Too much data in internal buffer.")

    def _readline(self):
        """
        Return next available line from member socket object.
        """
        return next(self.lines)

I haven't bothered comparing the code to be certain, but I'm doing fewer concatenations and splits, so I think mine may be more efficient.

Eric Pruitt
  • 1,825
  • 3
  • 21
  • 34
1

I realize that your edit is clarifying that what you want is a builtin approach to achieving your goal. But I am not aware of anything existing that will help you in that fine grained control over the readline approach. But I thought I might just include an example that does do a coded approach with a generator and a split... Just for fun.

Reference this other question/answer for a nice generator that reads lines:
https://stackoverflow.com/a/822788/496445

Based on that reader:

server.py

import socket

MAXLINE = 100

def linesplit(sock, maxline=0):
    buf = sock.recv(16)
    done = False
    while not done:
        # mid line check        
        if maxline and len(buf) > maxline:
            yield buf, True

        if "\n" in buf:
            (line, buf) = buf.split("\n", 1)
            err = maxline and len(line) > maxline
            yield line+"\n", err
        else:
            more = sock.recv(16)
            if not more:
                done = True
            else:
                buf = buf+more
    if buf:
        err = maxline and len(buf) > maxline
        yield buf, err


HOST = ''                
PORT = 50007             
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
conn, addr = s.accept()
print 'Connected by', addr
for line, err in linesplit(conn, MAXLINE):
    if err:
        print "Error: Line greater than allowed length %d (got %d)" \
                % (MAXLINE, len(line))
        break
    else:
        print "Received data:", line.strip()
conn.close()

client.py

import socket
import time
import random

HOST = ''    
PORT = 50007             
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
while True:
    val = 'x'*random.randint(1, 50)
    if random.random() > .5:
        val += "\n"
    s.sendall(val)
    time.sleep(.1)
s.close()

output

Connected by ('127.0.0.1', 57912)
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
...
Received data: xxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Error: Line greater than allowed length 100 (got 102)

The server reads over the data it receives and constantly checks the length of the line once it assembles one. If at any time the line exceeds the amount specified, it returns an error code. I threw this together kind of fast so I am sure the checks could be cleaned up a bit more, and the read buffer amount can be changed to address how quickly you want to detect the long lines before consuming too much data. In the output example above, I only got 2 more bytes than is allowed, and it stopped.

The client just sends random length data, with a 50/50 change of a newline.

Community
  • 1
  • 1
jdi
  • 90,542
  • 19
  • 167
  • 203