-3

I need to know the number of lines of a file before processing it, because I need to know the number of lines before read it, or in the worst case escenario read it twice..... so I made this code but It not works.. so maybe is just not possible ?

InputStream inputStream2 = getInputStream();

BufferedReader reader = new BufferedReader(new InputStreamReader(getInputStream()));

String line;
int numLines = 0;
while ((line = reader.readLine()) != null) {
        numLines++;
}

TextFileDataCollection dataCollection = new TextFileDataCollection (numLines, 50);

BufferedReader reader2 = new BufferedReader(new InputStreamReader(inputStream2));

while ((line = reader2.readLine()) != null) {
        StringTokenizer st = new StringTokenizer(reader2.readLine(), ",");
        while (st.hasMoreElements()) {
            System.out.println(st.nextElement());
        }
}
La Carbonell
  • 1,976
  • 5
  • 22
  • 52
  • 1
    its definitely possible and there are plenty of examples on SO detailing it. https://stackoverflow.com/questions/453018/number-of-lines-in-a-file-in-java – Harry Jul 06 '17 at 14:22
  • 3
    Possible duplicate of [Number of lines in a file in Java](https://stackoverflow.com/questions/453018/number-of-lines-in-a-file-in-java) – kennyFF92 Jul 06 '17 at 14:23
  • 1
    Well no, @Harry, it is not possible to know the number of lines in a file *without processing it*. You need to examine the file's contents to count lines, and that's one form of processing. – John Bollinger Jul 06 '17 at 14:24
  • 1
    You're using the same InputStream twice - after the first loop, it's at the end of the file, so the second loop will not read anything. You need to open **a new InputStream** for the second loop. And you need to **close the old one** after the first loop. – Erwin Bolwidt Jul 06 '17 at 14:32
  • 1
    If you don't want to iterate through the file twice, you could consider redesigning the `TextFileDataCollection` so it doesn't need to know the dataset size up-front. – dcsohl Jul 06 '17 at 14:44

2 Answers2

0

Here's a similar question with java code, although it's a bit older:

Number of lines in a file in Java

public static int countLines(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

EDIT:

Here's a reference related to inputstreams specifically:

From Total number of rows in an InputStream (or CsvMapper) in Java

"Unless you know the row count ahead of time, it is not possible without looping. You have to read that file in its entirety to know how many lines are in it, and neither InputStream nor CsvMapper have a means of reading ahead and abstracting that for you (they are both stream oriented interfaces).

None of the interfaces that ObjectReader can operate on support querying the underlying file size (if it's a file) or number of bytes read so far.

One possible option is to create your own custom InputStream that also provides methods for grabbing the total size and number of bytes read so far, e.g. if it is reading from a file, it can expose the underlying File.length() and also track the number of bytes read. This may not be entirely accurate, especially if Jackson buffers far ahead, but it could get you something at least."

CSLearner
  • 249
  • 1
  • 5
  • 17
  • 1
    Someone indicated that the question is about inputstreams rather than a file. Unfortunately that means this specific solution probably won't work then. – CSLearner Jul 06 '17 at 14:27
0

You write

I need to know the number of lines of a file before processing it

but you don't present any file in your code; rather, you present only an InputStream. This makes a difference, because indeed no, you cannot know the number of lines in the input without examining the input to count them.

If you had a file name, File object, or similar mechanism by which you could access the data more than once, then that would be straightforward, but a stream is not guaranteed to be associated with any persistent file -- it might convey data piped from another process or communicated over a network connection, for example. Therefore, each byte provided by a generic InputStream can be read only once.

InputStream does provide an API for marking (mark()) a position and later returning to it (reset()), but stream implementations are not required to support it, and many do not. Those that do support it typically impose a limit on how far past the mark you can read before invalidating it. Readers support such a facility as well, with similar limitations.

Overall, if your only access to the data is via an InputStream, then your best bet is to process it without relying on advance knowledge of the contents. But if you want to be able to read the data twice, to count lines first, for example, then you need to make your own arrangements to stash the data somewhere in order to ensure your ability to do so. For example, you might copy it to a temporary file, or if you're prepared to rely on the input not being too large for it then you might store the contents in memory as a List of byte, byte[], char, or String.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157