comparing two files and priniting lines with similar strings in one file

Question

I have two file which I need to compare, and if the first column in file1 matches part of the fisrt columns in file2, then add them side by side in file3, below is an example:

File1:

123123,ABC,2016-08-18,18:53:53
456456,ABC,2016-08-18,18:53:53
789789,ABC,2016-08-18,18:53:53
123123,ABC,2016-02-15,12:46:22

File2

789789_TTT,567774,223452
123123_TTT,121212,343434
456456_TTT,323232,223344

output:

123123,ABC,2016-08-18,18:53:53,123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53,456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53,789789_TTT,567774,223452
123123,ABC,2016-02-15,18:53:53,123123_TTT,121212,343434

Thanks..

is the last line of output supposed to be `123123,ABC,2016-02-15,12:46:22,123123_TTT,121212,343434` — pakistanprogrammerclub, Aug 28 '16 at 12:35
yes, since column 1 in file 1 is matching the first col in file 2 — Ali Jaber, Aug 28 '16 at 13:24

score 1 · Answer 1 · answered Aug 28 '16 at 11:46

1

Usin Gnu AWK:

$ awk -F, 'NR==FNR{a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0;next} $1 in a{print $0","a[$1]}' file2 file1
123123,ABC,2016-08-18,18:53:53 123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53 456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53 789789_TTT,567774,223452
123123,ABC,2016-02-15,12:46:22 123123_TTT,121212,343434

Explanation:

NR==FNR {                                   # for the first file (file2)
    a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0 # store to array
    next
} 
$1 in a {                                   # if the key from second file in array
    print $0","a[$1]                        # output
}

answered Aug 28 '16 at 11:46

James Brown

36,089
7
43
59

thanks, but im getting an error when using it: `awk -F, 'NR==FNR{a[gensub(/([^_]*)_.*/,"\\1","g",$1)]=$0;next} $1 in a{print $0","a[$1]}' file2.txt file1.txt awk: syntax error near line 1 awk: illegal statement near line 1 awk: syntax error near line 1 awk: bailing out near line 1` – Ali Jaber Aug 28 '16 at 13:26
That's odd. It works fine on my computer. Are you using Gnu awk? – James Brown Aug 28 '16 at 16:24

pakistanprogrammerclub · Answer 2 · 2016-08-28T14:04:32.240

1

awk solution matches keys formed from file2 against column 1 of file1 - should also work on Solaris using /usr/xpg4/bin/awk - I took the liberty of assuming the last line of OP output has a typo

file1=$1
file2=$2
AWK=awk
[[ $(uname) == SunOS ]] && AWK=/usr/xpg4/bin/awk
$AWK -F',' '
BEGIN{OFS=","}
# file2 key is part of $1 till underscore 
FNR==NR{key=substr($1,1,index($1,"_")-1); f2[key]=$0; next}
$1 in f2 {print $0, f2[$1]}
' $file2 $file1

tested

123123,ABC,2016-08-18,18:53:53,123123_TTT,121212,343434
456456,ABC,2016-08-18,18:53:53,456456_TTT,323232,223344
789789,ABC,2016-08-18,18:53:53,789789_TTT,567774,223452
123123,ABC,2016-02-15,12:46:22,123123_TTT,121212,343434

edited Aug 28 '16 at 14:04

answered Aug 28 '16 at 11:49

pakistanprogrammerclub

807
4
6

thanks, but im getting and error when using it: – Ali Jaber Aug 28 '16 at 13:24
what's the error? – pakistanprogrammerclub Aug 28 '16 at 13:25
`-bash-3.2# file1=$1 -bash-3.2# file2=$2 -bash-3.2# awk -F',' ' > BEGIN{OFS=","} > # file2 key is part of $1 till underscore > ARGIND==1{key=substr($1,1,index($1,"_")-1); f2[key]=$0; next} > $1 in f2 {print $0, f2[$1]} > ' $file2 $file1 awk: syntax error near line 5 awk: bailing out near line 5` – Ali Jaber Aug 28 '16 at 13:27
are you using gnu awk? what does `awk -V` say? – pakistanprogrammerclub Aug 28 '16 at 13:28
I recommend putting the code in a file then running it with the two filename arguments – pakistanprogrammerclub Aug 28 '16 at 13:32
nothing, its blank, `-bash-3.2# awk -V ` – Ali Jaber Aug 28 '16 at 13:32
what OS is it? Not Linux? – pakistanprogrammerclub Aug 28 '16 at 13:33
im using a solaris platform -bash-3.2# uname -a SunOS AAA 5.10 Generic_148888-01 sun4v sparc SUNW,Netra-T2000 – Ali Jaber Aug 28 '16 at 13:36
try my pure bash solution just posted – pakistanprogrammerclub Aug 28 '16 at 13:50
also modified this awk solution to work with Solaris XPG4 awk – pakistanprogrammerclub Aug 28 '16 at 14:05

score 0 · Answer 3 · answered Aug 28 '16 at 13:49

0

Pure bash solution

file1=$1
file2=$2
while IFS= read -r line; do
  key=${line%%_*}
  f2[key]=$line
done <$file2
while IFS= read -r line; do
  key=${line%%,*}
  [[ -n ${f2[key]} ]] || continue
  echo "$line,${f2[key]}"
done <$file1

answered Aug 28 '16 at 13:49

pakistanprogrammerclub

807
4
6

This works perfectly, thank you!! – Ali Jaber Aug 29 '16 at 05:34

comparing two files and priniting lines with similar strings in one file

3 Answers3