8

To send data to a file on my FTP server, I need to create a custom InputStream implementation that reads database data row by row, converts it to CSV and publishes it via its read() methods: from the database, I get a List<Application> object with the data. For each Application object, I want to create a line in the CSV file.

My idea is to load all the data in the constructor and then override the read method. Do I need to override all InputStream's methods? I tried googling for some examples but didn't succeed - could you eventually give me a link to one?

John Manak
  • 13,328
  • 29
  • 78
  • 119
  • It might be easier to write bytes to PipedOutputStream which would be read from a corresponding PipedOutputStream: https://stackoverflow.com/a/23874232/1941359 – AlexO Sep 14 '21 at 15:23

6 Answers6

13

You only nead to implement the read() method without parameters. All other methods are implemented as calls to that method. For performance reasons (and even ease of implementation) it might be easier to implement the three-argument read() method instead and re-implement the no-args read() method in terms of that method.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • 4
    According to the documentation, you need to implement the available() method also. – Mr Ed Jan 28 '12 at 22:06
  • 1
    Mr. Ed: it clearly says "should". And since you should not rely on `available()` to know how many bytes to ready anyway, I'd say that the stream will work just fine with the default implementation. – Joachim Sauer Jan 29 '12 at 15:47
  • 3
    The default implementation of available() returns zero. If some function relied on that to know if they could read and get something or not, that function won't work unless available() works. – Mr Ed Jan 31 '12 at 07:29
  • 2
    @Mr Ed: yes, that's true. But such a function would be broken: `available()` is only defined to be an estimate. And relying on an estimate to be accurate is a mistake, in my opinion. – Joachim Sauer Jan 31 '12 at 11:46
  • 3
    The latest documentation says: "...The available method for class InputStream always returns 0. This method should be overridden by subclasses. Returns: an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking or 0 when it reaches the end of the input stream." Anyway, the only reason I mentioned it in the first place was because my program didn't work because I didn't define that method. – Mr Ed Jan 31 '12 at 13:09
  • my actual experience here is if you only implement read() bufferedinputstream will return short reads if you try to read past 256 characters, when I implemented available() it started working correctly. – stu Aug 17 '20 at 01:26
9

Some very important points which I met when implementing my InputStream.

  1. Override available(). As the Javadoc says:

    The available method for class InputStream always returns 0. This method should be overridden by subclasses.

    not overriding this method will causes that any tempt to test whether this stream is readable return false. For example, if you feed your inputStream to a inputStreamReader, this reader will always return false when you invoke reader.ready().

  2. return -1 in the read(). The doc didn't emphasize it:

    If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

    if you choose to block read() when no data is available, you have to remember to return -1 at some situations. Not doing this may causes that another read(byte b[], int off, int len) blocks for the following code in the source:

    for (; i < len ; i++) {// default len is a relative large number (8192 - readPosition)
        c = read();
        if (c == -1) {
            break;
        }
        b[off + i] = (byte)c;
    }
    

    And this causes some(if not all) high level read block, like a reader's readLine(), read() etc.

Tony
  • 5,972
  • 2
  • 39
  • 58
4

For possibly large data you can use com.google.common.io.FileBackedOutputStream from guava.

Javadoc: An OutputStream that starts buffering to a byte array, but switches to file buffering once the data reaches a configurable size.

Using out.getSupplier().getInput() you get your InputStream.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • so could I first use it as an output stream, submitting all my data, and then get an input stream providing all that data to my FTP client? – John Manak Jan 27 '11 at 12:03
  • Yes and no, it should be easier than doing it on the fly. If there's any error when reading, nothing gets sent. You can display the whole content while debugging if you want to. It's up to you. – maaartinus Jan 27 '11 at 23:15
1

There's absolutely no need to create a custom InputStream. Use ByteArrayInputStream, something like this:

public static InputStream createStream(){
    final String csv = createCsvFromDataBaseValues();
    return new ByteArrayInputStream(csv.getBytes());
}

Especially given this quote:

My idea is to load all the data in the constructor and then override the read method.

If you do it like this, you gain absolutely nothing by implementing a custom InputStream. It's pretty much equivalent to the approach I outlined above.

Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
  • it's probably in tens of thousands of `Application` objects; each produces a line in the CSV file of roughly 100 characters. would it be a good idea to produce such a long string in memory or would it be better to make a temporary file and transfer it when it's done? – John Manak Jan 26 '11 at 13:52
  • @John either way, I'd create a common interface that I'd pass to the Application objects, and I'd experiment with both `StringBuilder`- and `File` backed versions of this interface. – Sean Patrick Floyd Jan 26 '11 at 13:56
  • My understanding says, it would fail if the data is larger than 2GB. Correct me, if I am wrong. – Haseeb Jadoon Aug 10 '22 at 13:07
1

Why do you need a custon inputstream? why not just write the csv data as you generate it to the outputstream being written to the ftp server?

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • It's a nice approach if you can partition the reads and writes into [separate threads](https://docs.oracle.com/javase/9/docs/api/java/io/PipedInputStream.html). – Brent Bradburn Aug 02 '18 at 00:04
  • @nobar, that's rarely useful. about the only time that's a useful thing to do is if the producer and consumer can both arbitrarily block on some other resources. – jtahlborn Aug 13 '18 at 14:45
0

If the data is not too large, you could:

  • Read it all
  • Convert to CSV (text)
  • Get the text bytes (via String.getBytes(encoding))
  • But the byte array in a ByteArrayInputStream
Bart van Heukelom
  • 43,244
  • 59
  • 186
  • 301