2

Problem statement

In a compiled .jar file that I execute on wsl Ubuntu, I run command: task reportToGetUrgencyOfAllTasks which returns a list of tasks with columns: id, uuid, urgency. I can inspect the list, it is printed to terminal, and it consist of 1793 tasks and 1798 lines (a header and some irrelevant messages about requiring sync.). When I count the number of objects with reader.lines().count() it returns 1798 and consumes the lines as expected.

If I create a while loop with:

long counter = 0;
while (reader.lines().iterator().hasNext()) {
    counter++;
    System.out.println("counter=" + counter);
}

It lists numbers 1 to 1798, which is weird because I thought the .hasNext() did not consume items, but only checked whether the next element existed or not, without taking it from the stream. (As suggested here:https://hajsoftutorial.com/iterator-hasnext-and-next/). I expected an infinite loop of numbers because the iterator would stay at the first element forever after the hasNext().

So then if I want to take all elements from the reader and put them into an ArrayList with:

ArrayList<String> capturedCommandOutput = new ArrayList<String>();
while (reader.lines().iterator().hasNext()) {
    counter++;
    capturedCommandOutput.add(reader.lines().iterator().next());
}

It skips every other line.

Question

Why does hasNext() consume elements from the iterator in this scenario?

Complete code For an MWE one needs to install taskwarrior on WSL Ubuntu, however, I think this is a Java problem since I can see that the reader does contain all lines/information, hence for completeness: the complete method that runs the command is:

package customSortServer;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Map;
import java.util.StringJoiner;

public class RunCommandsLongOutput2 {
    /**
     * This executes the commands in terminal. Additionally it sets an environment
     * variable (not necessary for your particular solution) Additionally it sets a
     * working path (not necessary for your particular solution)
     * 
     * @param commandData
     * @param ansYes
     * @throws Exception
     */
    public static ArrayList<String> executeCommands(Command command, Boolean ansYes) {
        ArrayList<String> capturedCommandOutput = new ArrayList<String>();
        File workingDirectory = new File(command.getWorkingDirectory());

        // create a ProcessBuilder to execute the commands in
        ProcessBuilder processBuilder = new ProcessBuilder(command.getCommandLines());

        // this is set an environment variable for the command (if needed)
        if (command.isSetEnvVar()) {
            processBuilder = setEnvironmentVariable(processBuilder, command);
        }

        // this can be used to set the working directory for the command
        if (command.isSetWorkingPath()) {
            processBuilder.directory(workingDirectory);
        }

        // execute the actual commands
        try {
            Process process = processBuilder.start();
            System.out.println("Started");
            if (command.isGetOutput()) {

                // capture the output stream of the command
                BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
//              System.out.println(reader.lines().count());
//              long counter = 0;
                while (reader.lines().iterator().hasNext()) {

                    capturedCommandOutput.add(reader.lines().iterator().next());
                }
//              while (reader.lines().iterator().hasNext()) {
//                  counter++;
//                  System.out.println("line=" + reader.lines().iterator().next());
//              }
            }

            // connect the output of your command to any new input.
            // e.g. if you get prompted for `yes`
            new Thread(new SyncPipe(process.getErrorStream(), System.err)).start();
            new Thread(new SyncPipe(process.getInputStream(), System.out)).start();
            PrintWriter stdin = new PrintWriter(process.getOutputStream());

            // This is not necessary but can be used to answer yes to being prompted
            if (ansYes) {
                stdin.println("yes");
            }

            // write any other commands you want here
            stdin.close();

            // If the command execution led to an error, returnCode!=0, or not (=0).
            int returnCode = process.waitFor();
            System.out.println("Return code = " + returnCode);
        } catch (IOException e1) {
            e1.printStackTrace();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        // return output if required:
        return capturedCommandOutput;
    }

    /**
     * source: https://stackoverflow.com/questions/7369664/using-export-in-java
     * 
     * @param processBuilder
     * @param varName
     * @param varContent
     * @return
     */
    private static ProcessBuilder setEnvironmentVariable(ProcessBuilder processBuilder, Command command) {
        Map<String, String> env = processBuilder.environment();
        env.put(command.getEnvVarName(), command.getEnvVarContent());
        processBuilder.environment().put(command.getEnvVarName(), command.getEnvVarContent());
        return processBuilder;
    }
}

And the method that generates the command is:

    public static void createCommandToGetUrgencyList(HardCoded hardCoded) {

        // create copy command
        Command command = new Command();
        String[] commandLines = new String[2];
        commandLines[0] = "task";
        commandLines[1] = hardCoded.getGetUrgencyReportName();
        command.setCommandLines(commandLines);
        command.setEnvVarContent("/var/taskd");
        command.setEnvVarName("TASKDDATA");
        command.setWorkingPath("/usr/share/taskd/pki");
        command.setGetOutput(true);

        // execute command to copy file
        ArrayList<String> urgencyList = RunCommandsLongOutput2.executeCommands(command, false);
        System.out.println("The urgency list has length = "+urgencyList.size());
        for (int i = 0; i < urgencyList.size(); i++) {
            System.out.println("The output of the command =" + urgencyList.get(i));
        }
    }
Jonas
  • 121,568
  • 97
  • 310
  • 388
a.t.
  • 2,002
  • 3
  • 26
  • 66
  • 2
    You might want to use the same iterator in the while condition and in the body of the loop. – second Jul 22 '19 at 09:43

2 Answers2

2

You create multiple Iterators - one in the loop's condition, and then one more in each iteration of the loop.

It should be:

Iterator<String> iter = reader.lines().iterator();
long counter = 0;
while (iter.hasNext()) {
    counter++;
    System.out.println("counter=" + counter);
    capturedCommandOutput.add(iter.next());
}

Since each Iterator is generated from a different Stream<String> (returned by lines()), it is possible to call a terminal operation (iterator()) on each of the Streams, but when you call methods of the Iterators (either hashNext() or next()), the Iterators consume data from their single source of data - the BufferedReader.

As the Javadoc of lines() says:

The reader must not be operated on during the execution of the terminal stream operation. Otherwise, the result of the terminal stream operation is undefined.

iterator() is a terminal operation, and as long as you iterate on the Iterator returned by it, you are still not done with the terminal operation. Therefore, you should not operate on the reader until you finish with the Iterator. Calling lines() a second time counts as operating on the reader.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • Thank you, so to understand it correctly, the `reader.lines().iterator()` makes an `iterator` object, and since I use that line twice, it creates that iterator object twice. I get that's the problem, but I do not understand how that leads to double consumption of the elements from the `reader`. I assume the `reader.lines.iterator().hasNext()` takes at least the first element from the reader into the (new) iterator object, which causes the reduction of elements in the `reader`, and then when I create a new `iterator()` object in the while loop, the it again takes an element from the `reader`? – a.t. Jul 22 '19 at 09:50
  • 1
    @a.t. Since `reader.lines()` returns a `Stream` and `iterator()` is a terminal operation on that Stream. Since each iterator created via `reader.lines().iterator()` consume data from the same `BufferedReader` source, alternating calls to the different iterators (regardless of whether you call `next()` or `hasNext)`) consume a line from the Reader. Note the Javadoc of `lines()` says - `The reader must not be operated on during the execution of the terminal stream operation. Otherwise, the result of the terminal stream operation is undefined.` – Eran Jul 22 '19 at 09:59
2

You are creating a stream and an iterator from the stream again and again.

Calling reader.lines() creates a new stream every time - it has to, as streams are not reusable.

Calling iterator() on the stream is a terminal operation.

So what you are doing is creating a stream on the remaining elements in the reader, and then creating an iterator on it.

The iterator contract does not say that it wouldn't consume any elements from the stream. What it says is that it does not consume any elements from the iterator itself. That is, if you used

Iterator<String> iter = reader.lines().iterator();

And you call iter.hasNext(), you can always expect the first line that was available in the reader to be the element you read from iter - not from any other iterator on the same reader.

One way to implement this would be to read an element from the stream the first time you call hasNext() and keep it buffered. Once you call next() it will give you that buffered element. This maintains the iterator contract - but it still reads a line from the reader.

Now if you create another stream and another iterator, it will just consume the next line, and so on.

You should only call lines() once on the reader, and you should only call iterator() once on the resulting stream - or you should just use the "conservative" method of using readLine() until null is returned.

RealSkeptic
  • 33,993
  • 7
  • 53
  • 79