1

How can I get the number of lines(rows) from an InputStream or from a CsvMapper without looping through and counting them?

Below I have an InputStream created from a CSV file.

InputStream content = (... from a resource ...);
CsvMapper mapper = new CsvMapper();
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
MappingIterator<Object[]> it = mapper
        .reader(Object[].class)
        .readValues(content);

Is it possible to do something like

int totalRows = mapper.getTotalRows();

I would like to use this number in the loop to update progress.

while (it.hasNextValue()){
    //do stuff here

    updateProgressHere(currentRow, totalRows);
}

Obviously, I can loop through and count them once. Then loop through again and process them while updating progress. This is inefficient and slow as some of these InputStreams are huge.

Ervin
  • 706
  • 5
  • 21
  • 2
    They need to get counted somehow. Unless the number of rows are specified somewhere in the csv file, theres no way getting around iterating through it. Your best bet may to get the size of the file, then keep a running tally of the size of each line processed. You could use that to get a percentage of completion that way. – dan Mar 07 '14 at 15:32
  • How long does the operation usually take? If it's on the order of 20-30 seconds or so, you can generally get away with one of those vague back-and-forth progress bars without hurting the UX (or even a completely fake one that counts down a fixed upper bound amount of time, pleasant user surprise if it ends early -- cheesy but might accomplish the goal of keeping a user happy that the program is working). – Jason C Mar 07 '14 at 15:49
  • Longer than that :( each row in the file ends up being a separate request. – Ervin Mar 07 '14 at 15:57
  • @tiger13cubed What is the source of the file? Is it just a generic file upload, or does it come from e.g. a custom client application or an AJAX request? If the client has the ability to read the file size (or just quickly count the lines in the file) ahead of time it could include the row count / size as a URL parameter which you can pass along. – Jason C Mar 07 '14 at 16:02

2 Answers2

1

Unless you know the row count ahead of time, it is not possible without looping. You have to read that file in its entirety to know how many lines are in it, and neither InputStream nor CsvMapper have a means of reading ahead and abstracting that for you (they are both stream oriented interfaces).

None of the interfaces that ObjectReader can operate on support querying the underlying file size (if it's a file) or number of bytes read so far.

One possible option is to create your own custom InputStream that also provides methods for grabbing the total size and number of bytes read so far, e.g. if it is reading from a file, it can expose the underlying File.length() and also track the number of bytes read. This may not be entirely accurate, especially if Jackson buffers far ahead, but it could get you something at least.

Jason C
  • 38,729
  • 14
  • 126
  • 182
  • One approach I considered was from this post http://stackoverflow.com/questions/8505670/get-the-number-of-bytes-of-a-file-behind-a-java-inputstream but what I was really hoping for was progress based on rows and not bytes. Does it help to clarify that this inputstream is coming in as a POST request body? – Ervin Mar 07 '14 at 15:38
  • @tiger13cubed Maybe, if the receiver class that's implementing `InputStream` exposes the POST length. What HTTP server are you using? It is Tomcat or something self-contained? The usage of `available()` as an initial estimate and progress counter may work, you'd really just have to try it. Unless your sender specifically sends row count first, though, you *have* to either go with bytes (if available) or parse it fully first (defeats the purpose, of course). – Jason C Mar 07 '14 at 15:45
  • (Although, I guess if it's chunked, there's no total length available in the request, and so `available()` probably wouldn't get you very far either.) – Jason C Mar 07 '14 at 16:03
0

Technically spoken, there are only two ways. Either (as you have seen) looping through and incrementing counter, or:

On the sender, the first information to send would be the counter, and then sending the data. This enables you to evaluate the first bytes as count when reading the stream at the begin. Precondition of this procedure is of course that the sending application knows in advance the size of data to be sent.

Meno Hochschild
  • 42,708
  • 7
  • 104
  • 126