1

Okay, so I'm in a situation where I'd really like to be using either a co-process via coproc or via redirection such as <(some command), but unfortunately I'm limited to bash 3.2 in one of the my target environments, which means I'm limited in what I can do.

The reason I need a co-process is that I need to read line-by-line from one file, while looping over another.

Currently I'm using exec <6 /foo/bar to create keep a file open for reading so that I can do read line <&6 whenever I need more input. This works fine, but it only works on plain-text files, however really I'd like to keep my file(s) compressed, rather than decompressing them before running my script.

I also need to be able to do the same for writing to a new, compressed file without having to waste space writing in plain-text then compressing afterwards.

So… are there any alternatives available in bash 3? As I've noted, I'm already in a loop over another file, so I don't have the option of just piping my output into gzip (or piping zcat into my loop) as I need to do this independently of my loop.

To try to give an example, here's a stripped down version of what I'm doing now:

# Decompress compressed match-file
gzip -dc /foo/compressed.gz > /tmp/match

# Setup file handles (to keep files open for reading/writing)
exec 5< /tmp/match
exec 6> /tmp/matches

# Loop over input file (/foo/bar) for matches
read next_match <&5
while read line; do
    if [ "$line" = "$next_match" ]; then
        read next_match <&5
        echo "$line" >&6
    fi

    echo "$line"
done < /foo/bar

# Close file handles
exec <5&-
exec 6>&-
rm /tmp/match

# Compress matches and overwrite old match file
gzip -cf9 /tmp/matches /foo/compressed.gz
rm /tmp/matches

Forgive any typos, and the general uselessness of the actual script, I just wanted to keep it fairly simple. As you can see, while it works fine, it's not exactly optimal thanks to the wasteful plain-text files.

Haravikk
  • 3,109
  • 1
  • 33
  • 46
  • Can you try to switch your while loop to another fd and show what happens (read normally from stdin when doing other reads and just redirect to a file without specifying some new fd) `while read liine <&3` or whatever fd you will like and do `done 3< /foo/bar` at the end – Reinstate Monica Please Jan 26 '14 at 19:06
  • While process substitution isn't POSIX-compatible, it is available in `bash` 3.2. – chepner Jan 26 '14 at 20:57

2 Answers2

3

You might want to use mknod to create pipes and let gzip write/read in background processes. The following seems to work for me:

#!/bin/bash

# create test files (one character per line)
echo abcdefgh | grep -o . | gzip > /tmp/foo.gz
echo aafbchddjjklsefksi | grep -o . > /tmp/bar

# create pipes for zipping an unzipping
PIPE_GUNZIP=/tmp/$$.gunzip
PIPE_GZIP=/tmp/$$.gzip
mkfifo "$PIPE_GUNZIP"
mkfifo "$PIPE_GZIP"

# use pipes as endpoints for gzip / gunzip
gzip -dc /tmp/foo.gz > "$PIPE_GUNZIP" &
GUNZIP_PID=$!
gzip -c9 > /tmp/foo.gz.INCOMPLETE < "$PIPE_GZIP" &
GZIP_PID=$!

exec 5< "$PIPE_GUNZIP"
exec 6> "$PIPE_GZIP"

read next_match <&5
while read line; do
    if [ "$line" = "$next_match" ]; then
        read next_match <&5
        echo "$line" >&6
    fi

    echo "$line"
done < /tmp/bar

# Close file handles
exec 5<&-
exec 6>&-

# wait for gzip to terminate, replace input with output, clean up
wait $GZIP_PID
mv /tmp/foo.gz.INCOMPLETE /tmp/foo.gz
rm "$PIPE_GZIP"

# wait for gunzip to terminate, clean up
wait $GUNZIP_PID
rm "$PIPE_GUNZIP"

# check result
ls -l /tmp/{foo,bar}*
gzip -dc /tmp/foo.gz
halfbit
  • 3,414
  • 1
  • 20
  • 26
  • `mkfifo` provides a simpler interface to creating a named pipe than using `mknod` directly. – chepner Jan 26 '14 at 20:55
  • @chepner: I agree, thanks. Replaced `mknod XXX p` with `mkfifo XXX`. – halfbit Jan 26 '14 at 21:04
  • Also, +1, since this is how to work around the lack of process substitution in the current POSIX standard. – chepner Jan 26 '14 at 21:06
  • Just wanted to say thanks for this answer, but also to note that in my specific case `mknod` is actually the more useful as it seems to be available on all environments I work with, while `mkfifo` is not, though I agree it's the easier option to use. – Haravikk Feb 07 '14 at 15:39
1

Since process substitution is available in bash 3.2, you can simply use it.

# Setup file handles (to keep files open for reading/writing)
exec 5< <( gzip -dc /foo/compressed.gz )
exec 6> >( gzip -c9 /foo/new_compressed.gz)

# Loop over input file (/foo/bar) for matches
read next_match <&5
while read line; do
    if [ "$line" = "$next_match" ]; then
        read next_match <&5
        echo "$line" >&6
    fi

    echo "$line"
done < /foo/bar

# Close file handles
exec <5&- 6>&-

# Overwrite old match file
mv /foo/new_compressed.gz /foo/compressed.gz
chepner
  • 497,756
  • 71
  • 530
  • 681
  • Unfortunately as I mentioned in my question, some of the environments (NAS devices mainly) I have to work with don't have a recent enough version of bash and installing one isn't an option (it's supposed to be portable). Actually some of them don't seem to have `mkfifo` either, so I guess I'll just have to run a check and use the best method available. – Haravikk Feb 02 '14 at 15:12