0

Is there a way to read from two files at the same time in a bash script? Something similar to usage of & in a programming language example:

for i in $(cat filea.txt) & for j in $(cat fileb.txt)
do
    if "$j" = "$i"
    then echo "$i"
    fi
done

Meaning: I want line 1 of filea to be compared with line 1 of fileb and line2 of filea to be compared with line2 of fileb and so on.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
t28292
  • 573
  • 2
  • 7
  • 12
  • I don't think there is any such construct. If there is, it will be highly specific to `bash`. – Jonathan Leffler Aug 07 '13 at 06:38
  • I believe you are looking for lock-step iteration in bash – Snakes and Coffee Aug 07 '13 at 06:38
  • What is `$cat`? I don't see how the `cat` command could be executed here without backticks `\`cat filea.txt\``. And even that selects words, not lines. (The correction supplied by Jonathan does the same.) – Potatoswatter Aug 07 '13 at 06:40
  • @SnakesandCoffee: I can't find 'lock-step' or 'lockstep' in the Bash 4.2 reference manual...can you elaborate on what you are talking about? Is it an actual feature, or 'what would be required'? – Jonathan Leffler Aug 07 '13 at 06:42
  • @JonathanLeffler Sounds like he just made that up. But interleaving the two files together might help, if we also have a way to extract lines one at a time: http://stackoverflow.com/questions/4011814/how-to-interleave-lines-from-two-text-files . – Potatoswatter Aug 07 '13 at 06:43
  • @Potatoswatter: I made the change because it seemed more like what was being requested. As originally written `(for i in $cat filea.txt)` would, ignoring the parentheses, have expanded the variable `$cat` and used the words in that plus the name `filea.txt` as the words assigned to `$i`. In `bash`, you should write `for i in $( – Jonathan Leffler Aug 07 '13 at 06:44
  • @JonathanLeffler it's just the name of iteration type you want, I believe, similar to python and ruby's zip function. – Snakes and Coffee Aug 07 '13 at 07:26
  • @SnakesandCoffee: Thanks. As a name, 'lock-step iteration' makes sense and is nicely descriptive, but it sounded like you might be mentioning a feature of `bash`. It might have misled someone reading the comment into looking for the feature. If you'd said: "I believe you are looking for 'lock-step iteration' in `bash` like the `zip` function in Python and Ruby." then there would have been no need for a discussion. Something to keep in mind...not a big problem. – Jonathan Leffler Aug 07 '13 at 07:35

2 Answers2

2

What you could do to compare lines, rather than words, is a little contorted, but:

while read -r -u 3 line1 && read -r -u 4 line2
do
    if [ "$line1" = "$line2" ]
    then echo "$line2"
    fi
done 3<filea.txt 4<fileb.txt

I've explicitly used file descriptors 3 and 4 for symmetry (Polya: "Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry”); it would also be possible but less clear to write:

while read -r line1 && read -r -u 3 line2
do
    if [ "$line1" = "$line2" ]
    then echo "$line2"
    fi
done <filea.txt 3<fileb.txt

What does read -u stand for, and what do 3 and 4 stand for?

Read the Fine Manual — the bash manual is available online, and if you haven't already got it bookmarked, now would be a fine time to bookmark it.

The -ufd option means 'read from the designated file descriptor (3 or 4 in the example code) instead of from standard input (which is file descriptor 0, of course). The 3 and 4 stand for the file descriptors to read from. The code redirects file descriptor 3 from filea.txt and file descriptor 4 from fileb.txt for the duration of the while loop. Any process that reads file descriptor 3 will get data from filea.txt; ditto 4. The script directs one read command to read from 3 and the other from 4, and read reads lines, so the net effect is that while there is data available from both filea.txt and fileb.txt, the loop will run, comparing the two lines and printing.

If your objective is to get the lines that are common between two files printed, then using the comm command is much better — see Jonas Elfström's answer for a good way to do that.

Assuming your objective is to read lines from two files in parallel, I can't immediately think of another way to do this in bash. If you need words from two files in parallel (which is what the for i in $(<filea.txt) notation would give), you'd probably need to use process substitution in conjunction with the I/O redirection:

while read -r -u 3 word1 && read -r -u 4 word2
do
    if [ "$word1" = "$word2" ]
    then echo "$word2"
    fi
done 3< <(printf "%s\n" $(<filea.txt)) 4< <(printf "%s\n" $(<fileb.txt))

There are other ways to achieve that too, notably using tr instead of printf, which would be better if the files are big (many megabytes big).

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • what does read -u stand for ? and what do 3 and 4 stand for ? – t28292 Aug 07 '13 at 06:52
  • 1
    @user2613272 according to `help read`, `-u` requests a particular file descriptor. 3 and 4 are file descriptors. In the latter example, he's reading from standard input. – Potatoswatter Aug 07 '13 at 07:05
  • what about the -r by wha way ? what is precisely used for ? – t28292 Aug 07 '13 at 07:21
  • @Potatoswatter: I guess the days are long gone when the `help` command was part of SCCS. – Jonathan Leffler Aug 07 '13 at 07:22
  • So you've still not learned how to read the manual, or use the information from `help read` as suggested by Potatoswatter? `help read` says, in part: _If the `-r` option is given, this signifies `raw' input, and backslash escaping is disabled._ To find out more about what that means, read the manual. – Jonathan Leffler Aug 07 '13 at 07:25
  • One last question on this issue : i used this code : while read -r -u 3 line1 && read -r -u 4 line 2 do echo"$line1;$line2" echo "0$line1;$line2" done 3 – t28292 Aug 07 '13 at 09:02
  • You turn on the 'prescience' switch so that the program knows when it reaches the last three lines of file 2. :D ... There isn't a particularly easy way to do it, and the solutions are indirect. You arrange to use a program that deletes the last 3 lines from file 2 before feeding it to the `while` loop. That isn't particularly trivial; `sed` doesn't help; neither do `tail` or `head` (that I can see; check GNU `head` which may be able to do it). I'd be tempted to use `ed file2 <<< '1,$-3p'` but there ought to be a better way. This would use process substitution, too. – Jonathan Leffler Aug 07 '13 at 09:18
  • I used your final code but i got /dev/fd/62: No such file or directory and if i try the ordinary one i get invalid file descriptor: bad file number so any suggestions ? – t28292 Aug 13 '13 at 07:21
  • Which version of `bash`, on which machine type? Are you using `/bin/sh` and is it actually `bash` or something else? IIRC, if `bash` is invoked as `sh`, it does not do process substitution. – Jonathan Leffler Aug 13 '13 at 11:32
1

If the goal is to output the lines that are the same in both files then you can use comm

comm -12 file1 file2

but if you actually need to step through two files then you can use file descriptors.

#! /bin/bash
while read -r lineF1 <&3 && read -r lineF2 <&4; do
  echo "$lineF1"
  echo "$lineF2"
done 3<file1 4<file2
Jonas Elfström
  • 30,834
  • 6
  • 70
  • 106
  • +1 for lateral thinking — though I think it misses the point of the question which is whether there's a way to step through the lines of two files in parallel. – Jonathan Leffler Aug 07 '13 at 06:51
  • yes i used it and it worked perfectly; One more question though how can i exclude the last 3 lines of file2 ; i don't want them to be in my output – t28292 Aug 07 '13 at 09:06
  • This variant will stop when either of the files ends. Are there an equal number of lines in the files? – Jonas Elfström Aug 07 '13 at 09:25