-1

I have 2 files with settings:

file1.txt    and   file2.txt


A=1                  A=2
B=3                  B=3
C=5                  C=4
D=6                   .
 .                   E=7

I am looking for the best approach to replace the values of the file1.txt with the diff values of file2.txt, so the file1.txt would look like:

file1.txt:

A=2       
B=3       
C=4       
D=6       
E=7

Currently i didn't write any code, but the only approach i think about is to write a bash script that diffs both files (provided as positional arguments), and use sed to replace non-matching strings. Something in this vein:

./diffreplace.bash file1.txt file2.txt > NEWfile1.txt

I wonder whether there is something more elegant that alerady exists?

faceless
  • 450
  • 4
  • 15

2 Answers2

1

All of the following solutions may change the order of assignments. I assumed that would be ok.

Lazy Solution

If you use these assignments in some way that allows overwriting, then you can simple append file2 to the end of file1. All old values will be overwritten be the new ones when you execute result.

cat old new > result

Slightly Better Solution

Extending the previous approach, you can iterate over the lines of result and for every variable, keep only the last assignment:

cat new old |
awk -F= '{if (a[$1]!="x") {print $0; a[$1]=x}}'

Alternative Solution

Use join to combine both files, then filter out the values from the first file by using cut. When your files are sorted, use

join -t= -a1 -a2 new old | cut -d= -f1,2 

if not, use

join -t= -a1 -a2 <(sort new) <(sort old) |
cut -d= -f1,2
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • the structure of the file must remain untouched. Sort mixes the order. – faceless Nov 13 '18 at 13:34
  • And what structure is that? A, if it exists, is always on the first line in both files or empty, B on the second etc.? – James Brown Nov 13 '18 at 13:41
  • @Socowi - the second 'join' solution worked the best for my need. But there is one thing - I need to merge the diffs into the 'new' file. When using the statement as you suggested I can see the right output on the screen. But if I add to it '> new' - the 'new' file gets relpaced by the 'old', and contains no lines that doesn't appear in the 'old'. Despite there are few and I could see it before redirecting the output. Do you have any idea why? – faceless Nov 15 '18 at 18:41
  • 1
    @faceless You cannot redirect to a file while reading it. See [this question](https://stackoverflow.com/q/6696842/6770384). Redirect to a temporary file or use `sponge` from `moreutils`. – Socowi Nov 15 '18 at 19:47
  • @Socowi - thanks for your last comment, it helped. Another query for 'join' solution. When i run it on a big (prod) files, i get "join: /file/:number: is not sorted: pattern=...". Thus i get duplicate entries in the output for that pattern, because there is no '-f2' for 'cut' command. I tried several articles that explain how to fix sorting issue (-d, --nocheck-order,...) but noone has remediated the issue. These unsorted patterns are on different lines in each file, so maybe this is the reason? Please help to resolve it as i'd really like to keep using the simplisity of join in my solution. – faceless Nov 18 '18 at 10:32
1

I'm a little puzzed over your comment the structure of the file must remain untouched. Sort mixes the order so I'm assuming that the As are always on line 1 or line 1 is . etc:

$ awk '
BEGIN { RS="\r?\n" }     # in case of Windows line-endings
$0!="." {                # we dont store . (change it to null if you need to)
    a[FNR]=$0            # hash using line number as key
}
END {                    # after all that hashing
    for(i=1;i<=FNR;i++)  # iterate in line number order
        print a[i]       # output the last met version
}' file1 file2           # mind the file order

Output:

A=2
B=3
C=4
D=6
E=7

Edit: A version with a whitelist:

$ cat whitelist
A
B
E

Script:

$ awk -F= '
NR==FNR {                # process the whitelist
    a[FNR]=$1            # for a key is linenumber, record as value
    b[$1]=FNR            # bor b record is key, linenumber is value
    n=FNR                # remember the count for END
    next
}                        # process file1 and file2 ... filen
($1 in b) {              # if record is found in b
    a[b[$1]]=$0          # we set the record to a[linenumber]=record
}
END {
    for(i=1;i<=n;i++)    # here we loop on linenumbers, 1 to n
        print a[i]
}' whitelist file1 file2

Output:

A=2
B=3
E=7
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • I need to test it and will approve a bit later. The concern is that the lines in both files are not constant and may vary, whereas the variables' names always the same. Also is this possible in this approach to define a "white list" for the settings of the first file that must not be changed? – faceless Nov 13 '18 at 14:13
  • Answering my previous comment I assume some sort must be applied to make the task easier – faceless Nov 13 '18 at 14:14
  • _white list_ - sure. – James Brown Nov 13 '18 at 14:53
  • i wonder why you and me getting the different results for the same code and the same files. When i test your first solution (w/o whitelist) i receive in the output the only contents of the 'file2' - not appended to differences of 'file1', but overidden. I get only A=2 B=3 C=4 E=7 But no "D" is inn the output. I am not deeply familiar with AWK, but could you please explain this difference as this approach seems to be very elegant and i'd like to lear more about it on example. – faceless Nov 14 '18 at 13:13
  • as for now i have created the following fork to count the differences between the files (with whitelist). But it uses a different approach. More character consuming: http://freetexthost.com/i30r14dghs – faceless Nov 14 '18 at 13:16
  • You have Windows line-endings (`\r\n`) where as I'm using Linux (`\n`). I'll fix it to the first version with `BEGIN{RS="\r?\n"}` which should work for both. – James Brown Nov 14 '18 at 13:42
  • i'm on linux too. This code seems to be replacing the whole strings if they are not equal - the D is being replaced by E in file1, values of other variables were changed according to file2, and there are still 4 rows in the file. I need the existing variables in file1 not to be wiped out, different variables - replaced, and not existing - appended. Ideas? – faceless Nov 14 '18 at 15:10
  • The first one replaces based on the line number, if A=?s are on 1st line in both files, line 1 from file1 is replaced by the line 1 from file2 (in output). The latter digest the whitelist first and only letters from whilelist are processed and output in the order of the whitelist. – James Brown Nov 14 '18 at 15:20