7

Supposing we have the following code snippet with a text file sample.txt redirected into STDIN:

@echo off
< "sample.txt" (
    set /P "ONE="
    set /P "TWO="
    findstr /R "^"
)
echo %ONE%, %TWO%

...and the content of the related text file sample.txt:

first
second
third
fourth

The output returned on the console is going to be this, which is exactly what I expect (lines first and second are consumed by set /P, hence findstr receives and processes the remaining lines):

third
fourth
first, second

The same output is achieved when findstr /R "^" is replaced by sort /R.

However, when replacing the findstr command line by find /V "" or by more, the output will be:

first
second
third
fourth
first, second

It seems that although set /P already consumed the lines first and second which is proved by the lastly output line, find and also more still receive the entire redirected data.

Why is this, what causes this behaviour? Is there a way to force find or more to receive only the remaining redirected data that has not already been processed by a preceding command?

(The behaviour is the same when redirecting the output data STDOUT to a text file. Also when executing a command line similar to the above batch code in cmd directly, nothing changes.)

aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • 1
    You may read a description of this behavior [here](http://stackoverflow.com/questions/8844868/what-are-the-undocumented-features-and-limitations-of-the-windows-findstr-comman/28278628#28278628), but the answer to your question is: "because such commands were programmed this way" – Aacini May 24 '16 at 23:05
  • Very interesting, @Aacini! I already suspected that `find` and `more` reset the file pointer, because I played around with all the commands I mentioned, mixed and reordered them, and I even wrapped a `for /F` around them (like: `for /F "delims=" %%L in ('more') do echo(%%L`), which all changed nothing at all. So I fear, there seems to be no (native) way to work around that behaviour? – aschipfl May 24 '16 at 23:50
  • 1
    You could try to pipe the result of `findstr /R "^"` to `find /V ""` or `more` – Dennis van Gils May 25 '16 at 07:35
  • Good idea, @DennisvanGils, piping allows to apply `find` or `more` to be applied on the _remaining_ rather than all data... – aschipfl May 25 '16 at 08:50
  • Unfortunately I just found out that the `findstr` method hangs in case the last line of `sample.txt` is not terminated by a line-break; the other commands (`sort`, `find`, `more`) work fine; so there seems to be no reliable way of returning the _remaining_ lines... or do you have any ideas? I tried to pipe a single line-break (`echo(`) into the expression in parentheses, but then the last line gets lost (by `findstr`)... – aschipfl May 25 '16 at 09:06
  • You could try `findstr /r "^" > someFile.txt &echo. >> someFile.txt`, then `find < someFile.txt` – Dennis van Gils May 25 '16 at 10:38
  • Yes of course, @DennisvanGils, appending a line-break can even be done in advance (with a single `STDOUT` redirection like `> "interim.txt" (findstr /R "^" "sample.txt" & echo()`), it is even possible to replace the `STDIN` redirection by a pipe, like `(findstr /R "^" "sample.txt" & echo()) | (rem /* original code in parens */)`; but pipes I don't like (particularly because of trouble with delayed expansion); the interim/temp. file stuff works, but it introduces additional file I/O operations... – aschipfl May 25 '16 at 12:28
  • While not helpful in a technical sense, I think you're finding an interesting holdover from the old `DOS` days. `FIND`, `MORE`, and `SORT` existed, whereas `FINDSTR` was added later, and may have been written to a newer set of standards. – Steven K. Mariner Aug 04 '17 at 23:46
  • @DennisvanGils, your [suggestion](https://stackoverflow.com/questions/37423428/why-do-some-commands-process-lines-of-redirected-stdin-data-which-are-already-co#comment62365148_37423428) of piping the data instead does unfortunately not work together with `set /P`, according to [this thread](https://stackoverflow.com/questions/41351844/piping-into-set-p-fails-due-to-uninitialised-data-pointer)... – aschipfl May 15 '18 at 20:04
  • To append a line-break to the last line only when there is none (see also [above](https://stackoverflow.com/questions/37423428/why-do-some-commands-process-lines-of-redirected-stdin-data-which-are-already-co#comment62376900_37423428)), `< "sample.txt" find /V "" | findstr /R "^"` could be done because `find` does exactly this job... – aschipfl Jun 13 '18 at 10:40

2 Answers2

1

Why do some commands process lines of redirected STDIN data which are already consumed by other commands?

Because some commands/programs rewind stdin. You can try this:

@echo off
< "sample.txt" (
    set /P "ONE="
    set /P "TWO="
    more +2
)
echo %ONE%, %TWO%

Result:

    third
    fourth
    first, second

The more +2 skips the first two lines of the file.

jwdonahue
  • 6,199
  • 2
  • 21
  • 43
  • By `echo %ONE%, %`, you actually mean `echo %ONE%, %TWO%`, right? This is a nice work-around for the code at hand, but `more` still reprocesses all data (it receives all lines, it just skips two lines from its output)... – aschipfl Dec 27 '17 at 08:22
  • @aschipfl, yes it should have been `%TWO%` not `%`, I missed that. Just edited the answer. Yes, `more` still rewinds stdin, in order to process the entire file. Anyway, I think it's the answer to your question, some programs don't behave like proper filters in a pipe chain. – jwdonahue Dec 27 '17 at 23:24
1

Well, the spot-on answer to the question as to why commands behave the way they do lies in Aacini's comment: »because such commands were programmed this way«.

Anyway, after quite some time, I want to collect my findings and eventually present a new work-around I recently found.

There are only a few commands that seem not to reset the data pointer, and each has got its pros and cons:

  1. The usage of findstr to return the remainder of the data is already demonstrated in the question. There is the problem that findstr may hang when redirected input data is not terminated by a final line-break: What are the undocumented features and limitations of the Windows FINDSTR command?

  2. pause does not reset the data pointer (and this is in fact the reason why I wanted to have it mentioned here), independent on whether the data come from input redirection or from a pipe, but it does not provide the consumed character by any means, unfortunately.

  3. set /P is fine for reading single lines that are not longer than about 1 Kbytes, so for returning the remainder of redirected data you will need some kind of loop:

     @echo off
     rem // Count total number of available lines in advance:
     for /F %%C in ('^< "sample.txt" find /C /V ""') do set "COUNT=%%C"
     < "sample.txt" (
          set /P "ONE="
          set /P "TWO="
          rem /* Loop here to return the rest; `3` is `1 + 2`, where `2`
          rem    is the hard-coded number of lines already handled; you can
          rem    just use `1` here, which will cause read attempty beyond
          rem    the end of data, causing empty lines to be returned: */
          for /L %%N in (3,1,%COUNT%) do (
              rem // Replace `&&` by `&` to NOT skip empty lines:
              set "LINE=" & set /P "LINE=" && call echo(%%LINE%%
          )
     )
     echo %ONE%, %TWO%
    

    Note that set /P cannot be used within pipes: Piping into SET /P fails due to uninitialised data pointer?

  4. Finally, sort can be used to return the remainder. To prevent it from jumbling the lines of text, use the character position option /+n and set n to a number beyond the actual line lengths:

     @echo off
     set "ONE="
     set "TWO="
     < "sample.txt" (
         set /P "ONE="
         set /P "TWO="
         rem /* `sort` with the sort position set beyond the lines of text seems to
         rem    simply revert the current sort order; another reversion restores the
         rem    original sort order; I set the sort position just beyond the maximum
         rem    record or line length, which I set to the greatest possible value: */
         sort /+65536 /REC 65535 | sort /+65536 /REC 65535
     )
     echo %ONE%, %TWO%
    

    I set the record or line length (/REC) to the greatest possible value as it defaults to 4096. Note that the minimum value is actually 128 in case you specify something less. Also note that line-breaks are regarded for the count as well.

aschipfl
  • 33,626
  • 12
  • 54
  • 99