Filtering specific lines in one file based on another file in a shell

Question

Imagine having these two files:

First:

# foo
http://some.url/
# bar
http://foo.url/
# bar
http://and.one.more/url

Second:

foo
doo

Now, I want to print out from the First file those lines that start with # and contain words from the Second file -- and not only these lines but also those urls that follow matched lines.

At first, it seemed I could use grep:

grep -f Second -A 1 First

But, of course, that was a mistake:

# foo
http://some.url/
--
http://foo.url/
# bar

So, my question is, how can I limit filtering only to those lines that start with #? And upon finding such lines, print those out, as well as the next line after those. Would be great if that could be done with some standard tools, like grep, sed, or awk.

Desired outcome for this very example would be:

# foo
http://some.url/

EDIT: Sorry for disturbing you all, for my particular case I decided simply to join lines temporarily, then to grep -f Second First, and then to split the resulting lines back on printing out.

What you describe in your latest edit `I decided simply to join lines temporarily, then to grep -f Second First` can't possibly work. It may produce the output you expect from some given input set but it is very fragile and so will fail given other input. — Ed Morton, Mar 20 '20 at 00:50
@EdMorton You probably have some scenario in mind? Some specific pitfalls? — A S, Mar 20 '20 at 00:57
There are many scenarios but the most obvious one is the 2nd block of 2 lines from `First` in your example. Since `foo` appears in the URL (`http://foo.url/`), if you combine every pair of lines and then grep then those 2 lines will get printed despite `foo` not being present in the line that starts with `#` (`# bar`). By combining the lines first you make this a much harder problem to solve because now you need to distinguish between strings that match in the URLs vs strings that match in the original `#` lines, Then with the grep you posted youd also worry about partial and/or regexp matches — Ed Morton, Mar 20 '20 at 01:01
Patterns in Second can be tweaked not to consider the url. Like, adding `^` in the beginning, for example. — A S, Mar 20 '20 at 01:05
That's not the only issue and again - you're just making the job harder unnecessarily, you are going down the wrong tracks with the approach of combining the lines. You got several answers that do what you said you wanted, you should accept one and if, as it sounds like, you have a different problem from the one you posted then post a new question if you'd like help with it. — Ed Morton, Mar 20 '20 at 01:07
The point of a community is not to provide a naked workaround from input to output but to help to learn. So, no, thanks, but I will not accept anything here. Sorry I wasted your time you could have used to add some comments to your code. — A S, Mar 20 '20 at 01:21
As of "making the job harder unnecessarily", then no, I don't think so. It just requires more precise patterns, which -- in my world -- is a good thing. — A S, Mar 20 '20 at 01:22

Barmar · Answer 1 · 2020-03-19T23:25:54.150

2

You can use process substitution to prefix each line in the file with ^#, append $ and use that as the -f argument to grep.

grep -f <(sed 's/.*/^# &$/' Second) -A 1 First

edited Mar 19 '20 at 23:25

answered Mar 19 '20 at 23:22

Barmar

741,623
53
500
612

score 2 · Answer 2 · answered Mar 19 '20 at 23:22

2

sed in a process substitution will work:

grep -A1 -Fwf <(sed 's/^/# /' second) first
# foo
http://some.url/

answered Mar 19 '20 at 23:22

David C. Rankin

81,885
6
58
85

score 0 · Answer 3 · answered Mar 20 '20 at 00:04

The simplest way that I can think of, would be to give yourself some labels in both files, first.

File A:-

#----------+foo----------
# foo
http://some.url/
#----------.foo----------

File B:-

#----------+foo----------
doo
#----------.foo----------

Now you can just do:-

#!/bin/sh

sed -n "/+foo/,/.foo/" filea.txt >> newfile.txt
sed -n "/+foo/,/.foo/" fileb.txt >> newfile.txt

cat > edpop+.txt << EOF
4,5d
wq
EOF

ed -s newfile.txt < edpop+.txt

This will get rid of the middle two bands, once you've got your text together in the one file.

Filtering specific lines in one file based on another file in a shell

3 Answers3