1

I am querying one file with the other file and have them as following:

File1:

Angela S Darvill| text text text text   
Helen Stanley| text text text text   
Carol Haigh S|text text text text .....

File2:

Carol Haigh  
Helen Stanley  
Angela Darvill

This command:

awk 'NR==FNR{_[$1];next} ($1 in _)' File2.txt File1.txt

returns lines that overlap, BUT doesn’t have a strict match. Having a strict match, only Helen Stanley should have been returned.

How do you restrict awk on a strict overlap?

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
belovedname
  • 49
  • 1
  • 5
  • 1
    Don't use `_` as a variable name - that's even worse than a single-letter variable name for readability and there's just no reason not to at least come up with a single letter, even if it's just `a` for "array" in this case. – Ed Morton Jan 30 '22 at 14:09

2 Answers2

4

With your shown samples please try following. You were on right track, you need to do 2 things, 1st: take whole line as an index in array a while reading file2.txt and set field seapeator to | before awk starts reading file1

awk -F'|' 'NR==FNR{a[$0];next} $1 in a' File2.txt File1.txt

Command above doesn’t work for me (I am on Mac, don’t know whether it matters), but

awk 'NR==FNR{_[$0];next} ($1 in _)' File2.txt. FS="|" File1.txt

worked well

belovedname
  • 49
  • 1
  • 5
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • @belovedname, your welcome, just another note: in case your file2 is never having `|` in it then you can mention `FS="|"` in `BEGIN` section of `awk` program itself ELSE you can keep answer as it is shown as of now in my answer. – RavinderSingh13 Jan 30 '22 at 12:00
  • 2
    It doesn't matter if file2 has `|` in it or not since you never use individual fields from file2, just $0. – Ed Morton Jan 30 '22 at 14:17
  • 1
    @belovedname the command you edited into Ravinders answer is functionally **exactly** the same command as is already in it except your first file name ends in a `.`, which is almost certainly just a typo. Being on a Mac makes no difference. It's simply impossible for one of those commands to work for you but the other not work. – Ed Morton Jan 31 '22 at 04:11
0

You can also use grep to match from File2.txt as a list of regexes to make an exact match.

You can use sed to prepare the matches. Here is an example:

sed -E 's/[ \t]*$//; s/^(.*)$/^\1|/' File2.txt
^Carol Haigh|
^Helen Stanley|
^Angela Darvill|
...

Then use process with that sed as an -f argument to grep:

grep -f <(sed -E 's/[ \t]*$//; s/^(.*)$/^\1|/' File2.txt) File1.txt
Helen Stanley| text text text text  

Since your example File2.txt has trailing spaces, the sed has s/[ \t]*$//; as the first substitution. If your actual file does not have those trading spaces, you can do:

grep -f <(sed -E 's/.*/^&|/' File2.txt) File1.txt

Ed Morton brings up a good point that grep will still interpret RE meta-characters in File2.txt. You can use the flag -F so only literal strings are used:

grep -F -f <(sed -E 's/.*/&|/' File2.txt) File1.txt
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 2
    You'd also need the `sed` to add escaping to make regexp meta-chars literal so that a name like `P.T.Barnum` doesn't falsely match `PAT Barnum`. See [is-it-possible-to-escape-regex-metacharacters-reliably-with-sed](https://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-metacharacters-reliably-with-sed) for how to do that (but obviously using a string match with awk is vastly simpler, more efficient, etc.) – Ed Morton Jan 30 '22 at 14:13