1

I have two files: file1 and file2. i want to combine the time between file1 and file2 with common column as 1 and 2. I need to create a separate file which contains the file1 data and file2 mapping time column. But for some entires(see BOLD TEXT) its matching the same line from file2. Whats wrong in my command? is there any way to get the expected output?

Note: there can be more than 2 entries as shown in the bold text. column1 and column2 entries can be same while only time changes.

file 1

9950 | p | 2021-02-27 14:16:21.905
9951 | q | 2021-02-27 14:16:28.845
9954 | r | 2021-02-27 14:17:21.884
9954 | r | 2021-02-27 14:19:46.775
9956 | y | 2021-02-27 14:20:20.830
9957 | z | 2021-02-27 14:20:27.325

file 2

9950 | p | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.956
9956 | y | 2021-02-27 14:20:21.018
9957 | z | 2021-02-27 14:20:27.512

command:

 awk -F'|' 'NR==FNR{a[$1,$2]=$3;next} a[$1,$2]{print $0"\t"a[$1,$2]}' file2 file1

Output:

9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:19:46.956
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | x | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | y | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512

Expected output:

9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | x | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | y | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512

oguz ismail
  • 1
  • 16
  • 47
  • 69
shreyash
  • 426
  • 1
  • 3
  • 11

3 Answers3

3
$ awk '
BEGIN {
    FS=" [|] "
    OFS=" | "
}
NR==FNR {
    a[$1,++c[$1]]=$0          # c[$1] is an instance counter array, 
    next                      # unique for each $1
}
{
    print a[$1,++d[$1]],$NF   # so is d[$1]
}' file1 file2

Output:

9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
...

Updated with request from comments:

$ awk '
BEGIN {
    FS=" [|] "
    OFS=" | "
}
NR==FNR {
    a[$1,++c[$1]]=$0          # c[$1] is an instance counter array, 
    next                      # unique for each $1
}
(($1,++d[$1]) in a) {         # only output if match in both files
    print a[$1,d[$1]],$NF     # so is d[$1]
}' file1 file2
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • Hii, thanks for the solution. but there one thing when there is no mapping in request lets say 9951 transaction doesn't exists in file1 but it exists in file2. then i want to drop that entry. how can i do that? – shreyash Feb 27 '21 at 18:50
3

Similar to James Brown's answer, but only needing to keep one "count" array, and using $1 plus $2 for the key

gawk -F '|' '
    FNR == 1 {delete count}
    {key = $1 SUBSEP $2 SUBSEP (++count[$1,$2])}
    FILENAME == ARGV[1] {time[key] = $3; next}
    key in time {print $0, FS, time[key]}
' file2 file1
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Hey @glenn. Thanks for your answer. Its working perfectly in my scenario. Can you plase explain me the above gawk command. – shreyash Mar 01 '21 at 08:02
3

If this isn't all you need then edit your question to provide more truly representative sample input/output including cases that this doesn't work for:

$ paste file1 file2 | awk '{$7=$8=$9=""; $0=$0; $1=$1}1'
9950 | p | 2021-02-27 14:16:21.905 | 2021-02-27 14:16:22.086
9951 | q | 2021-02-27 14:16:28.845 | 2021-02-27 14:16:29.033
9954 | r | 2021-02-27 14:17:21.884 | 2021-02-27 14:17:22.072
9954 | r | 2021-02-27 14:19:46.775 | 2021-02-27 14:19:46.956
9956 | y | 2021-02-27 14:20:20.830 | 2021-02-27 14:20:21.018
9957 | z | 2021-02-27 14:20:27.325 | 2021-02-27 14:20:27.512

In the above $7=$8=$9="" set those fields to NULL strings but does not change how many fields are in $0, then $0=$0 causes awk to resplit $0 thereby removing those as individual fields but not altering the white space around where they existed, then $1=$1 causes awk to reconstruct $0 from its fields replacing every string that matches FS (i.e. contiguous white space) with the string that matches OFS (i.e. a single blank char). For example (with | tr ' ' '-' added to make the spaces more clearly visible):

$ echo 'a b c' | awk '{print NF, $0}' | tr ' ' '-'
3-a-b-c
$ echo 'a b c' | awk '{$2=""; print NF, $0}' | tr ' ' '-'
3-a--c
$ echo 'a b c' | awk '{$2=""; $0=$0; print NF, $0}' | tr ' ' '-'
2-a--c
$ echo 'a b c' | awk '{$2=""; $0=$0; $1=$1; print NF, $0}' | tr ' ' '-'
2-a-c
Ed Morton
  • 188,023
  • 17
  • 78
  • 185