Substract the content of a file from another file

Question

I'm working on a bash script, with the main objective is to create a .conf file, in which the content is the subtraction of file 2 from file 1.
Example :
File 1

ready   serv1   FBgn001bKJ
ready   serv2   FBgn003mLo  
ready   serv3   FBgn002lPx  
ready   serv4   FBgn000Pas

File 2

ready   serv1   FBgn001bKJ
ready   serv4   FBgn000Pas

Result

ready   serv2   FBgn003mLo  
ready   serv3   FBgn002lPx

I've tried to use this function but it doesn't give any result :

COMPARE_FILES() {
awk '
    NR==FNR {a[FNR]=$0; next}
    {
        b=$0; gsub(/[0-9]+/,"",b)
        c=a[FNR]; gsub(/[0-9]+/,"",c)
        if (b != c) {printf "< %s\n> %s\n", $0, a[FNR]}
    }' "$1" "$2"
}

Any suggestion of how i can make it work ! PS : The whitespace between the two files can be different!

Please do add your efforts(tried code) in your question, thank you. — RavinderSingh13, Sep 05 '22 at 12:29
Regarding `COMPARE_FILES()` - don't use all upper case names for functions, variables, etc. See [correct-bash-and-shell-script-variable-capitalization](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization). — Ed Morton, Sep 05 '22 at 13:56
Please [edit] your question to state (and show in your example) that, as we discovered in comments/testing, the white space between fields can be different between the 2 files and that there could be leading and/or trailing spaces in any line of either file and so you want to compare based on the values of the individual fields, not the whole line. — Ed Morton, Sep 05 '22 at 13:57
Why in your code are you removing digits, e.g. `gsub(/[0-9]+/,"",b)` and `gsub(/[0-9]+/,"",c)`? Should digits not be part of the comparison? If so, again update your example to demonstrate that. — Ed Morton, Sep 05 '22 at 13:58
Don't just say `PS : The whitespace between the two files can be different!`, [edit] your example to SHOW the white space being different, e.g. 5 blanks between fields in file1 vs 10 blanks between fields in file2. Show some lines with leading white space, others without. — Ed Morton, Sep 05 '22 at 14:00
Please fix your question to show your real problem for the benefit of others with similar question in future even if you believe you have an answer to your real problem. Also explain why you're removing digits in your code. — Ed Morton, Sep 05 '22 at 14:13

markp-fuso · Answer 1 · 2022-09-05T13:30:31.617

3

Assumptions:

each line within a file is unique (ie, no duplicate lines exist within a given file)
matching lines are 100% identical (this actually isn't the case with OP's data as I found a variable number of trailing spaces in some lines; I manually removed all trailing spaces before running the following solutions)

One comm idea:

$ comm -23 file1 file2
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

NOTE: comm requires input files are already sorted (as per OP's sample)

As for an awk solution:

$ awk 'FNR==NR {a[$0];next} !($0 in a)' file2 file1
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

NOTE: the 1st file fed to awk is file2

Modifying to remove trailing white space:

$ comm -23 <(sed 's/[[:space:]]*$//' file1) <(sed 's/[[:space:]]*$//' file2)
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

$ awk '{sub(/[[:space:]]*$/,"")} FNR==NR {a[$0];next} !($0 in a)' file2 file1
ready   serv2   FBgn003mLo
ready   serv3   FBgn002lPx

edited Sep 05 '22 at 13:30

answered Sep 05 '22 at 13:16

markp-fuso

28,790
4
16
36

it doesnt work it just gives the content of file 2 – youssef Elhanafi Sep 05 '22 at 13:29
@youssefElhanafi which of the 4x commands doesn't work? the 1st and 2nd commands are going to generate unwanted results if the input files (as with your samples) contain a variable number of trailing spaces on various lines; in the case of a variable number of trailing spaces try the 3rd and 4th commands – markp-fuso Sep 05 '22 at 13:32
All Four commands either give the content of file 1 or file 2 but never the difference – youssef Elhanafi Sep 05 '22 at 13:38
@youssefElhanafi the most likely problem is that white space between columns in `file` isn't the same as it is in `file2`, e.g. maybe one is blanks while the other is tabs. Try `awk '{orig=$0; $1=$1} FNR==NR {a[$0];next} !($0 in a){print orig}' file2 file1` or `awk '{key=$0; gsub(/[[:space:]]+/," ",key); gsub(/^ +| +$/,"",key)} FNR==NR {a[key];next} !(key in a)' file2 file1` – Ed Morton Sep 05 '22 at 13:46
1

@EdMorton You're right, i've changed the white space with ; and "comm" command worked ! – youssef Elhanafi Sep 05 '22 at 13:50
1

awk '{orig=$0; $1=$1} FNR==NR {a[$0];next} !($0 in a){print orig}' file2 file1 worked fine two without modification – youssef Elhanafi Sep 05 '22 at 13:52

Fravadona · Answer 2 · 2022-09-06T05:42:57.330

1

Why not a with simple grep?

grep -vxFf file2 file1

update:

Handling heterogeneous white spaces:

#!/bin/bash

normalize_spaces() {
    sed -e 's/^[[:space:]]\+//' \
        -e 's/[[:space:]]\+$//' \
        -e 's/[[:space:]]\+/ /g' \
        -- "$@"
}

grep -vxFf <(normalize_spaces file2) <(normalize_spaces file1)

edited Sep 06 '22 at 05:42

answered Sep 05 '22 at 18:04

Fravadona

13,917
1
23
35

1

@EdMorton My bad, I forgot to put it; thanks, it's fixed now – Fravadona Sep 05 '22 at 19:01
1

You're welcome. It's easy to miss cases when the provided sample input/output doesn't cover them - too often all we get is the most trivial sunny-day examples to test with. – Ed Morton Sep 05 '22 at 19:04

j_b · Answer 3 · 2022-09-05T13:55:01.673

0

Assuming the files you are comparing are sorted, diff may also be an option:

$ diff --unchanged-group-format="" --new-group-format="%>" f1.txt f2.txt 
ready   serv2   FBgn003mLo   
ready   serv3   FBgn002lPx

(I noted the extra white space in the OP's data as mentioned in the answer by @markp-fuso, but that did not affect my diff results)

edited Sep 05 '22 at 13:55

answered Sep 05 '22 at 13:17

j_b

1,975
3
8
14

Substract the content of a file from another file

3 Answers3

update: