0

I have 2 files:

file1

abc
def
ghi
jkl

file2 (might not be sorted in same order)

ghi:Checked
def:Checked

I would like to create a file like this:

abc
def:Checked
ghi:Checked
jkl

Is there a way to do that in shell ? I would like not to change the sorting order of my file1

Alex
  • 448
  • 6
  • 11

2 Answers2

2

Try this:

#! /bin/bash

exec < file1

while read -r id; do
  check=$(grep "^$id:" file2)
  if (($? == 0)); then
    echo "$check"
  else
    echo "$id"
  fi
done

Update: alternative implementation, which reads file2 just once.

#! /bin/bash

file2=$(grep ':Checked$' file2)

exec < file1

while read -r id; do
  check=$(grep "^$id:" <<< $file2)
  if (($? == 0)); then
    echo "$check"
  else
    echo "$id"
  fi
done
ceving
  • 21,900
  • 13
  • 104
  • 178
  • Works well, i would have preferred not to have to create a loop myself tho – Alex Aug 30 '21 at 10:35
  • Can also be `if check=$(grep "^$id:" file2.txt); then ...` without the need for `(($? == 0))`, just my two cents. – Jetchisel Aug 30 '21 at 10:40
  • @ceving : Fine, if `file1` is not too big, because you are creating one child process for each line in file1. – user1934428 Aug 30 '21 at 10:41
  • @user1934428 Creating child processes for almost everything is the way shell scripting works. If you do not like it, use Perl instead. – ceving Aug 30 '21 at 11:01
  • @ceving : Well, you can solve this in bash *without* having a child process. I am not against child processes as such, I just try to avoid a large amount of unnecessary ones. – user1934428 Aug 30 '21 at 11:06
  • @user1934428 The problem is not the child process. The problem is that for each line in file1 all lines of file2 are read. If file2 is big, this might get slow. In order to solve this, you have to keep file2 in memory. But this is just another you pay. If you re-read file2, you pay with CPU time. If you keep file2 in memory, you pay with memory usage. Which way is better, depends on the requirements. – ceving Aug 30 '21 at 14:17
  • @ceving : That's why I suggested in my comment to use an associative array for the content of file2. If the files in question are so big that we run into memory problems, the whole algorithm needs to be reconsidered anyway..... – user1934428 Aug 30 '21 at 14:33
  • @user1934428 The memory footprint of my first version does not depend on the size of the files. The files can be as big as you can buy storage. Only the second version will fail, if file2 is bigger than the memory. – ceving Aug 30 '21 at 14:43
2

This should do it:

$ join -a 1 -t: <( sort file1 ) <( sort file2 )
abc
def:Checked
ghi:Checked
jkl
Ionuț G. Stan
  • 176,118
  • 18
  • 189
  • 202