22

I am using below command to join two files using first two columns.

awk 'NR==FNR{a[$1,$2]=substr($0,3);next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_3.txt"}' br01.txt br02.txt

Now, by default AWk command uses whitespaces as the separators. But my file may contain single space between two words, e.g.

File 1:

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567

File 2:

ABCD               TEXT1 TEXT2                     12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      31242342342342342342342342343
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

I want the result file as ;

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312 12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423 31242342342342342342342342343
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

Any hints ?

Apurv
  • 3,723
  • 3
  • 30
  • 51

3 Answers3

47

awk supports a regular expression as the value of FS so you can specify a regular expression that matches at least two spaces. Something like -F '[[:space:]][[:space:]]+'.

$ awk '{print NF}' File2
4
3
4

$ awk -F '[[:space:]][[:space:]]+' '{print NF}' File2
3
3
3
Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
  • great! thats working, now I am using command `awk -F '[[:space:]][[:space:]]+' 'NR==FNR{a[$1,$2]=$3;next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_4.txt"}' br01.txt br02.txt`. But between the concatenation of the records from the two file for any row, I see a Line Feed LF character, any hints on avoiding that ? So the joined rows are split into two rows. – Apurv Nov 10 '14 at 11:43
  • `print $0, a[$1,$2]` should be outputting the line from the second file followed by `OFS` (space by default) and then the value of `a[$1,$2]` followed by `ORS` (newline by default). Is your first input file perhaps a DOS newline file? – Etan Reisner Nov 10 '14 at 14:20
  • This helped me parse the output of a system command that always uses at least 2 spaces to delineate columns, so many thanks! – dragon788 Jun 09 '17 at 18:45
4

You are using fixed width fields so you should be using gnu awk FIELDWIDTHS (or similar) to separate the fields, e.g. if the 2nd field is the 15 chars from char 8 to char 23 inclusive in this file:

$ cat file
abc    def ghi        klm
AAAAAAAB C D E F G H IJJJJ
abc       def ghi     klm

$ awk -v FIELDWIDTHS="7 15 4" '{print "<" $2 ">"}' file
<def ghi        >
<B C D E F G H I>
<   def ghi     >

Any solution that relies on a certain number of spaces between fields will fail when you have 1 or zero spaces between your fields.

If you want to strip leading/trailing blanks from your target field(s):

$ awk -v FIELDWIDTHS="7 15 4" '{gsub(/^\s+|\s+$/,"",$2); print "<" $2 ">"}' file
<def ghi>
<B C D E F G H I>
<def ghi>
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

awk automatically detects multiple spaces if field seperator is set to " "

Thus, this simply works:

awk -F' ' '{ print $2 }'

to get the second column if you have a table like the one mentioned.

pas-calc
  • 115
  • 9
  • 1
    _"Thus, this simply works:"_ it doesn't, does it? You are not telling `awk` to distinguish between single and multiple spaces, namely multiple spaces being the delimiters and single spaces columns being considered as single column instead. You're essentially just printing the second character after any number of spaces, in your example (thus not returning `TEXT1 TEXT2` instead as indicated). – gented Feb 16 '22 at 10:30