1

I want to run a script on each line of an input file/stream in bash. This is simple when the lines are short:

while read -r line; do echo "$line" | ./process.sh; done < input.txt

However, the lines are very long (several MiB) so I cannot use read any more. This works:

split -l1 < input.txt
for file in x??; do ./process.sh < "$file"; done
rm x??

but it's very slow as it creates temporary files.

Is there a way to pipe the input lines directly to process.sh so that the script is invoked once per line?

Christoph Walesch
  • 2,337
  • 2
  • 32
  • 44

4 Answers4

2

Given this input file:

$ cat file
foo bar
etc

Try this with GNU xargs (using awk '{print "<"$0">"}' as ./process.sh):

$ < file xargs -I {} -d'\n' -n1 printf '%s\n' '{}' | awk '{print "<"$0">"}'
<foo bar>
<etc>

or this otherwise:

$ tr '\n' '\0' < file | xargs -I {} -0 -n1 printf '%s\n' '{}' | awk '{print "<"$0">"}'
<foo bar>
<etc>

See https://stackoverflow.com/a/28806991/1745001 for an explanation.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

GNU parallel, different from xargs, is capable of sending the data to the standard input of the command instead of passing it as arguments. So you could try:

parallel -j1 -N1 --pipe ./process.sh < input.txt

-j1 for one job at a time only. Use -jN to run N jobs in parallel, or -j+0 to run as many jobs as you have cores on your computer.

-N1 to pass one line per job.

--pipe to send the data to the standard input of the command instead of passing it as command arguments.

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
0

Try sed:

$ cat file
one
two
tre

$ sed 's/.*/echo "|&|"/e' file
|one|
|two|
|tre|

In your case it would be like:

$ sed 's|.*|./process.sh "&"|e' file

Awk can also run commands, but I bet awk alone can do what you need.

Ivan
  • 6,188
  • 1
  • 16
  • 23
0

I've tested the while read ... code in the question with input lines of 8MB and it works fine. I assume your issue is that it is too slow.

If you can use Perl then this code shows one way to do it:

perl -Mautodie -nle "open P, '| ./process.sh'; print P; close P" input.txt
  • The -Mautodie causes the code to fail with error messages if any file or pipe operations fail. Although the autodie module has been a core module in Perl for over a decade, it may not be available in some Perl installations. Some are very old. Others (for unknown reasons) don't include all core modules. The code will work if you remove -Mautodie, but it may fail silently if something goes wrong.

If Python is an option, this code may be of use:

IFS= read -r -d '' python_code <<'_END_PYTHON_CODE_'
import sys
from subprocess import Popen, PIPE

for line in sys.stdin:
    with Popen(sys.argv[1:], stdin=PIPE, text=True) as p:
        p.stdin.write(line)
_END_PYTHON_CODE_

python -c "$python_code" ./process.sh <input.txt
  • I don't have much experience of Python, so the code may not be good. It runs slightly slower than the Perl code in my testing.
pjh
  • 6,388
  • 2
  • 16
  • 17