Add suffix to each lines containing specific word

Question

I have 2 files:

file1

abc
def
ghi
jkl

file2 (might not be sorted in same order)

ghi:Checked
def:Checked

I would like to create a file like this:

abc
def:Checked
ghi:Checked
jkl

Is there a way to do that in shell ? I would like not to change the sorting order of my file1

Your sample files are sorted. If they were not, would sorting be allowed? — user1984, Aug 30 '21 at 10:02
Strictly speaking, this question is about a left join, not an inner join, so maybe it makes sense to leave it open. — Ionuț G. Stan, Aug 30 '21 at 10:16
The duplicate has several answers which do not require sorting. — tripleee, Aug 30 '21 at 10:37
@Alex: For performance reason, I would put each _Checked_ key from file2 into an associative array and then loop through file1, and test whether the key is in the array. — user1934428, Aug 30 '21 at 10:40

ceving · Accepted Answer · 2021-08-30T14:25:38.917

2

Try this:

#! /bin/bash

exec < file1

while read -r id; do
  check=$(grep "^$id:" file2)
  if (($? == 0)); then
    echo "$check"
  else
    echo "$id"
  fi
done

Update: alternative implementation, which reads file2 just once.

#! /bin/bash

file2=$(grep ':Checked$' file2)

exec < file1

while read -r id; do
  check=$(grep "^$id:" <<< $file2)
  if (($? == 0)); then
    echo "$check"
  else
    echo "$id"
  fi
done

edited Aug 30 '21 at 14:25

answered Aug 30 '21 at 08:52

ceving

21,900
13
104
178

Works well, i would have preferred not to have to create a loop myself tho – Alex Aug 30 '21 at 10:35
Can also be `if check=$(grep "^$id:" file2.txt); then ...` without the need for `(($? == 0))`, just my two cents. – Jetchisel Aug 30 '21 at 10:40
@ceving : Fine, if `file1` is not too big, because you are creating one child process for each line in file1. – user1934428 Aug 30 '21 at 10:41
@user1934428 Creating child processes for almost everything is the way shell scripting works. If you do not like it, use Perl instead. – ceving Aug 30 '21 at 11:01
@ceving : Well, you can solve this in bash *without* having a child process. I am not against child processes as such, I just try to avoid a large amount of unnecessary ones. – user1934428 Aug 30 '21 at 11:06
@user1934428 The problem is not the child process. The problem is that for each line in file1 all lines of file2 are read. If file2 is big, this might get slow. In order to solve this, you have to keep file2 in memory. But this is just another you pay. If you re-read file2, you pay with CPU time. If you keep file2 in memory, you pay with memory usage. Which way is better, depends on the requirements. – ceving Aug 30 '21 at 14:17
@ceving : That's why I suggested in my comment to use an associative array for the content of file2. If the files in question are so big that we run into memory problems, the whole algorithm needs to be reconsidered anyway..... – user1934428 Aug 30 '21 at 14:33
@user1934428 The memory footprint of my first version does not depend on the size of the files. The files can be as big as you can buy storage. Only the second version will fail, if file2 is bigger than the memory. – ceving Aug 30 '21 at 14:43

Ionuț G. Stan · Answer 2 · 2021-08-30T10:00:38.480

2

This should do it:

$ join -a 1 -t: <( sort file1 ) <( sort file2 )
abc
def:Checked
ghi:Checked
jkl

edited Aug 30 '21 at 10:00

answered Aug 30 '21 at 08:57

Ionuț G. Stan

176,118
18
189
202

Interesting, but it doesn't work if file2 isn't sorted – Alex Aug 30 '21 at 09:18
@Alex does it work if you sort them? I've edited my answer to incorporate that. – Ionuț G. Stan Aug 30 '21 at 10:01
Thanks for the edit, but in my case its would be better to not sort it – Alex Aug 30 '21 at 10:26

Add suffix to each lines containing specific word

2 Answers2