How to simply keep up-to-date records in file

Question

I would like to simply keep up-to-date records in file A given file B using bash on Linux.

Both A and B files have same structure.

There is a record on each line of file consists of public-key and comment separated by space. Comment is a composition of user@hostname and is unique in file.

Example:

B file
xxxxxx user1@hostname1
yyyyyy user2@hostname2
wwwwww user3@hostname3


A file
yxxxxx user1@hostname1
zzzzzz user4@hostname4
yyyyyy user2@hostname2

Which should result into:

A file
xxxxx user1@hostname1
zzzzz user4@hostname4
yyyyy user2@hostname2
wwwww user3@hostname3

I know I can read B file line by line and check whether file A contains a record by comment. If not append record. If yes, check whether to update. However it evolves a multiple lines of code in bash script.

Can it be done simpler?

Does this answer your question? [Comparing two files in linux terminal](https://stackoverflow.com/questions/14500787/comparing-two-files-in-linux-terminal) — ChristophS, Jul 02 '21 at 08:39
Maybe https://stackoverflow.com/questions/14500787/comparing-two-files-in-linux-terminal will help you? — ChristophS, Jul 02 '21 at 08:40
If you are attempting to merge `authorized_keys` files for SSH, a much better approach is to keep each authorized key in a separate file, and just `cat *.pub >authorized_keys` to regenerate the file when you have added, modified, or deleted one of the individual keys. — tripleee, Jul 02 '21 at 10:07
@tripleee sure I do it as you described with exception that it is allowed to have another unmanaged public keys in authorized_keys file. Mostly due to large infrastructure. — Michael Hrabě, Jul 04 '21 at 09:46
I'm afraid I don't understand. What do you mean by "another unmanaged key" and how does it relate to my suggestion? If I guessed your use case correctly, what I propose _is_ a simpler way to do what you ask; though then, your question is basically an [XY problem](https://en.wikipedia.org/wiki/XY_problem). — tripleee, Jul 04 '21 at 10:31
@tripleee Unmanaged key means that someone else can manually enter a value into authorized_keys and I don't want to alter this value. Your suggestion 'cat *.pub >authorized_keys' would remove the unmanaged key persistently. — Michael Hrabě, Jul 04 '21 at 14:49
Sure; but prohibiting this would seem like a small price to pay for a simple and robust solution. I would not be hard to include a "don't edit this file directly" comment at the top of the generated file (though I don't know if comments are supported there? I guess not). It's not harder to add your ad hoc key to a new `*.pub` file instead but of course you'll need to know about this policy. — tripleee, Jul 04 '21 at 15:17

score 1 · Answer 1 · answered Jul 01 '21 at 17:15

1

A little awk script

awk '
  NR == FNR {print; seen[$2]; next}
  !($2 in seen)
' A B

And to save the changes back to file A, pick one of

awk '...' A B | sponge A        # from the `moreutils` package

tmp=$(mktemp)
awk '...' A B > "$tmp" && mv "$tmp" A

answered Jul 01 '21 at 17:15

glenn jackman

238,783
38
220
352

score 1 · Answer 2 · answered Jul 01 '21 at 21:02

1

Yet another way to get the same result records, only sorted:

join -a1 -a2 -j2 B <(sort A) | awk '{print $2, $1}'

answered Jul 01 '21 at 21:02

Armali

18,255
14
57
171

score 0 · Answer 3 · answered Jul 01 '21 at 16:57

0

You can use diff command with grep

diff a.txt b.txt | grep -Po "^(<|>) \K.*"

answered Jul 01 '21 at 16:57

Moshe Fortgang

711
4
18

score 0 · Accepted Answer · answered Jul 01 '21 at 16:57

I suppose A is the original list and B is the update list.

Find users in B for which updates exist.

$ cut -d ' ' -f 2 B
user1@hostname1
user2@hostname2
user3@hostname3

Take A and remove all lines with users from B. This are the lines of A, for which no update exists.

$ grep -v -f <(cut -d ' ' -f 2 B) A
zzzzzz user4@hostname4

Append B to the above list:

$ grep -v -f <(cut -d ' ' -f 2 B) A; cat B
zzzzzz user4@hostname4
xxxxxx user1@hostname1
yyyyyy user2@hostname2
wwwwww user3@hostname3

Notice: the above works only as long as no email is a sub-string of another email. If this can not be guaranteed, you have to use extended regular expressions with word boundaries.

score 0 · Answer 5 · answered Jul 06 '21 at 11:42

If you are looking for a way to manage SSH keys in an authorized_keys file, my suggestion would be to generate this file from the *.pub keys in the current directory. Now the problem is reduced to adding a new key file, or removing or renaming a key file you want to exclude, and rerunning

cat *.pub >authorized_keys

(perhaps by way of a Makefile if that is a mechanism your users are familiar and comfortable with).

Obviously, there is a usability problem for users who forget or are unaware of this mechanism; but in many environments, this is acceptable and manageable with documentation and training.

The general mechanism of splitting monolithic configuration or data files into individual smaller files with simple fragments you can enable or disable individually is a good one to know about anyway. It is used in many places e.g. in Debian (see also run-parts for example) and systemd, so it should be easily recognizable and appreciated by admins.

How to simply keep up-to-date records in file

5 Answers5