0

I have two file which I need to compare, and if the first column in file1 matches part of the fisrt columns in file2, then add them side by side in file3, below is an example:

File1:

123123,ABC,2016-08-18,18:53:53
456456,ABC,2016-08-18,18:53:53
789789,ABC,2016-08-18,18:53:53
123123,ABC,2016-02-15,12:46:22

File2

789789_TTT,567774,223452
123123_TTT,121212,343434
456456_TTT,323232,223344

output:

123123,ABC,2016-08-18,18:53:53,123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53,456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53,789789_TTT,567774,223452
123123,ABC,2016-02-15,18:53:53,123123_TTT,121212,343434

Thanks..

Ali Jaber
  • 75
  • 1
  • 1
  • 7

3 Answers3

1

Usin Gnu AWK:

$ awk -F, 'NR==FNR{a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0;next} $1 in a{print $0","a[$1]}' file2 file1
123123,ABC,2016-08-18,18:53:53 123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53 456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53 789789_TTT,567774,223452
123123,ABC,2016-02-15,12:46:22 123123_TTT,121212,343434

Explanation:

NR==FNR {                                   # for the first file (file2)
    a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0 # store to array
    next
} 
$1 in a {                                   # if the key from second file in array
    print $0","a[$1]                        # output
}
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • thanks, but im getting an error when using it: `awk -F, 'NR==FNR{a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0;next} $1 in a{print $0","a[$1]}' file2.txt file1.txt awk: syntax error near line 1 awk: illegal statement near line 1 awk: syntax error near line 1 awk: bailing out near line 1` – Ali Jaber Aug 28 '16 at 13:26
  • That's odd. It works fine on my computer. Are you using Gnu awk? – James Brown Aug 28 '16 at 16:24
1

awk solution matches keys formed from file2 against column 1 of file1 - should also work on Solaris using /usr/xpg4/bin/awk - I took the liberty of assuming the last line of OP output has a typo

file1=$1
file2=$2
AWK=awk
[[ $(uname) == SunOS ]] && AWK=/usr/xpg4/bin/awk
$AWK -F',' '
BEGIN{OFS=","}
# file2 key is part of $1 till underscore 
FNR==NR{key=substr($1,1,index($1,"_")-1); f2[key]=$0; next}
$1 in f2 {print $0, f2[$1]}
' $file2 $file1

tested

123123,ABC,2016-08-18,18:53:53,123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53,456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53,789789_TTT,567774,223452
123123,ABC,2016-02-15,12:46:22,123123_TTT,121212,343434
0

Pure bash solution

file1=$1
file2=$2
while IFS= read -r line; do
  key=${line%%_*}
  f2[key]=$line
done <$file2
while IFS= read -r line; do
  key=${line%%,*}
  [[ -n ${f2[key]} ]] || continue
  echo "$line,${f2[key]}"
done <$file1