27

I use BufferedReader's readLine() method to read lines of text from a socket.

There is no obvious way to limit the length of the line read.

I am worried that the source of the data can (maliciously or by mistake) write a lot of data without any line feed character, and this will cause BufferedReader to allocate an unbounded amount of memory.

Is there a way to avoid that? Or do I have to implement a bounded version of readLine() myself?

Pratik Butani
  • 60,504
  • 58
  • 273
  • 437
daphshez
  • 9,272
  • 11
  • 47
  • 65

6 Answers6

15

The simplest way to do this will be to implement your own bounded line reader.

Or even simpler, reuse the code from this BoundedBufferedReader class.

Actually, coding a readLine() that works the same as the standard method is not trivial. Dealing with the 3 kinds of line terminator CORRECTLY requires some pretty careful coding. It is interesting to compare the different approaches of the above link with the Sun version and Apache Harmony version of BufferedReader.

Note: I'm not entirely convinced that either the bounded version or the Apache version is 100% correct. The bounded version assumes that the underlying stream supports mark and reset, which is certainly not always true. The Apache version appears to read-ahead one character if it sees a CR as the last character in the buffer. This would break on MacOS when reading input typed by the user. The Sun version handles this by setting a flag to cause the possible LF after the CR to be skipped on the next read... operation; i.e. no spurious read-ahead.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 4
    Or you can steal it: http://code.google.com/p/owasp-esapi-java/issues/attachmentText?id=183&aid=-7134623167843514645&name=BoundedBufferedReader.java – Randall Hunt May 11 '11 at 07:23
  • It's probably simpler to build the limit to the amount of data read in at the InputStream level and leave the logic of decoding lines where it is. – Neil Coffey May 11 '11 at 07:53
  • @Neil - yes. See @Tom Hawtin's answer. – Stephen C May 11 '11 at 07:56
  • @GáborLipták - If you would like to track it down, I'll fix it. – Stephen C Sep 12 '13 at 14:13
  • Maybe this was it: http://code.google.com/p/owasp-esapi-java/issues/detail?id=183&q=readline&colspec=ID%20Type%20Status%20Priority%20Milestone%20Component%20Owner%20Summary – Gábor Lipták Sep 12 '13 at 14:15
  • 1
    @GáborLipták - I think that is it. I updatedlink to the code itself, which is now on Github. Thanks. – Stephen C Sep 12 '13 at 15:35
14

Another option is Apache Commons' BoundedInputStream:

InputStream bounded = new BoundedInputStream(is, MAX_BYTE_COUNT);
BufferedReader reader = new BufferedReader(new InputStreamReader(bounded));
String line = reader.readLine();
Miles
  • 31,360
  • 7
  • 64
  • 74
Kevin Litwack
  • 219
  • 2
  • 4
  • 1
    Upvoted for letting somebody else do the hard work for you :D – Coderer May 13 '13 at 09:23
  • isn't this only applicable for when 1 byte == 1 character? when you're dealing with UTF-16 this is effectively halved. – Renan Jul 11 '15 at 00:16
  • 3
    @Renan yes, in the above example MAX_LINE_SIZE is assumed to be defined in bytes. In the scenario described by the OP, the input data is unknown and potentially malicious so you can't really assume anything about the encoding. Therefore a byte-count-based limits seems best. But if your use case has trusted data with a known, multi-byte encoding then you can adjust accordingly. I'll edit the variable name to be more explicit though ;) – Kevin Litwack Aug 09 '15 at 01:50
  • The link isn't working, but otherwise I love this answer. Here is an updated [link](https://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/input/BoundedInputStream.html) – Joshua Richardson Nov 14 '15 at 00:10
  • How to use it. Please provide a detail steps to use it. It is not readily available in Java itself – Mohith7548 Aug 14 '18 at 12:57
  • I've tested this option. I was frustrated at first because it didn't throw any Exception when dealing with a huge line file. But then I've noticed it just truncates the line - and that would work for me. – Leonardo Alves Machado Apr 26 '19 at 14:01
  • This option does not do what is asked. It puts a bound to the total file size and not to each line read. The `BoundedInputStream` limits the total bytes read and not the total bytes read per line. – Philipp Feb 01 '21 at 21:26
3

The limit for a String is 2 billion chars. If you want the limit to be smaller, you need to read the data yourself. You can read one char at a time from the buffered stream until the limit or a new line char is reached.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • BufferedReader is not trivial. Implementing your own is not a great option. – Jeffrey Blattman Nov 17 '15 at 22:24
  • @JeffreyBlattman and yet this is what seantmalone did in the accepted answer. What do you see and the better option? – Peter Lawrey Nov 18 '15 at 08:09
  • 2
    and seantmalone's solution only works for readers that support mark / reset. how about this one? https://github.com/pjpmarques/pmarques.util.io/blob/master/src/main/java/pmarques/util/io/BoundedBufferedReader.java – Jeffrey Blattman Nov 18 '15 at 16:24
3

Perhaps the easiest solution is to take a slightly different approach. Instead of attempting to prevent a DoS by limiting one particular read, limit the entire amount of raw data read. In this way you don't need to worry about using special code for every single read and loop, so long as the memory allocated is proportionate to incoming data.

You can either meter the Reader, or probably more appropriately, the undecoded Stream or equivalent.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • How do you propose to do it? What do you mean by *meter* the reader or the stream? – daphshez May 11 '11 at 08:28
  • @Daphna Shezaf Implement `FilterInputStream`, override `read`s, count bytes returned. Something like that. – Tom Hawtin - tackline May 11 '11 at 18:53
  • I think what you suggest only help if the total amount of data received from the socket can be bounded. In my case, I can receive an unbounded number of messages, I just want to bound the message length. – daphshez May 13 '11 at 06:22
  • 1
    @Daphna Shezaf There's nothing to stop you reseting the limit after each message, after reading the line, or at any other arbitrary point in between. – Tom Hawtin - tackline May 13 '11 at 09:15
1

There are a few ways round this:

  • if the amount of data overall is very small, load data in from the socket into a buffer (byte array, bytebuffer, depending on what you prefer), then wrap the BufferedReader around the data in memory (via a ByteArrayInputStream etc);
  • just catch the OutOfMemoryError, if it occurs; catching this error is generally not reliable, but in the specific case of catching array allocation failures, it is basically safe (but does not solve the issue of any knock-on effect that one thread allocating large amounts from the heap could have on other threads running in your application, for example);
  • implement a wrapper InputStream that will only read so many bytes, then insert this between the socket and BufferedReader;
  • ditch BufferedReader and split your lines via the regular expressions framework (implement a CharSequence whose chars are pulled from the stream, and then define a regular expression that limits the length of lines); in principle, a CharSequence is supposed to be random access, but for a simple "line splitting" regex, in practice you will probably find that successive chars are always requested, so that you can "cheat" in your implementation.
Neil Coffey
  • 21,615
  • 7
  • 62
  • 83
-2

In BufferedReader, instead of using String readLine(), use int read(char[] cbuf, int off, int len); you can then use boolean ready() to see if you got it all and convert in into a string using the constructor String(byte[] bytes, int offset, int length).

If you don't care about the whitespace and you just want to have a maximum number of characters per line, then the proposal Stephen suggested is really simple,

import java.io.BufferedReader;
import java.io.IOException;

public class BoundedReader extends BufferedReader {

    private final int  bufferSize;
    private       char buffer[];

    BoundedReader(final BufferedReader in, final int bufferSize) {
        super(in);
        this.bufferSize = bufferSize;
        this.buffer     = new char[bufferSize];
    }

    @Override
    public String readLine() throws IOException {
        int no;

        /* read up to bufferSize */
        if((no = this.read(buffer, 0, bufferSize)) == -1) return null;
        String input = new String(buffer, 0, no).trim();

        /* skip the rest */
        while(no >= bufferSize && ready()) {
            if((no = read(buffer, 0, bufferSize)) == -1) break;
        }

        return input;
    }

}

Edit: this is intended to read lines from a user terminal. It blocks until the next line, and returns a bufferSize-bounded String; any further input on the line is discarded.

Neil
  • 1,767
  • 2
  • 16
  • 22
  • You can't 'use `ready()` to see if you got it all'. That's not what it's for. See the Javadoc. – user207421 Dec 19 '14 at 03:59
  • You are right; this will, in general, eat some data that you actually could have used. However, in the case where data is sent one line at a time, the `ready()`: 'True if the next `read()` is guaranteed not to block for input, false otherwise,' is exactly what you need. – Neil Dec 23 '14 at 08:36
  • No, it isn't 'exactly what you need'. It tells you whether there is more data available to be read *without blocking.* Not whether 'you got it all'. See the Javadoc. And it doesn't answer the question about unlimited line lengths in any way. – user207421 Aug 09 '15 at 02:03
  • Can you provide an example of the difference between "no more data is available without blocking" and "you got up to the last line?" – Neil Aug 10 '15 at 20:39
  • 1
    "not blocking" means data is already in a buffer. It does not imply that the stream is closed and more data might (not) come in at a later time. Examples include slow network connections or reading from a user terminal. – Zefiro Sep 24 '15 at 17:26