Processing 2 files with different field separators using awk

Question

Let's say I have 2 files :

$ cat file1
A:10
B:5
C:12

$ cat file2
100 A
50 B
42 C

I'd like to have something like :

A 10 100
B 5 50
C 12 42

I tried this :

awk 'BEGIN{FS=":"}NR==FNR{a[$1]=$2;next}{FS=" ";print $2,a[$2],$1}' file1 file2

Which outputs me that :

  100 A
B 5 50
C 12 42

I guess the problem comes from the Field Separator which is set too late for the second file. How can I set different field separator for different files (and not for a single file) ?

Thanks

Edit: a more general case

With file2 and file3 like this :

$ cat file3
A:10 foo
B:5 bar 
C:12 baz

How to get :

A 10 foo 100
B 5 bar 50
C 12 baz 42

One option, not necessarily the best, is to preprocess one of the files so it has the same delimiter as the other, and then process them 'naturally' with the single delimiter. — Jonathan Leffler, Jul 01 '14 at 18:11
Possible duplicate of [AWK multiple delimiter](https://stackoverflow.com/q/12204192/608639) — jww, Aug 15 '18 at 23:17
@jww It's not. This question is about how to have different delimiters for different files (not a single file) and the answer is different. — jrjc, Aug 16 '18 at 11:23

score 22 · Accepted Answer · answered Jul 01 '14 at 19:00

22

Just set FS between files:

awk '...' FS=":" file1 FS=" " file2

i.e.:

$ awk 'NR==FNR{a[$1]=$2;next}{print $2,a[$2],$1}' FS=":" file1 FS=" " file2
A 10 100
B 5 50
C 12 42

answered Jul 01 '14 at 19:00

Ed Morton

188,023
17
78
185

1

ahhh nice, that's what I was looking for ! – jrjc Jul 01 '14 at 19:05
2

Yes this is what setting variables in the file list is for - populating initial values differently for different files. For anything else, you're better off setting them up front using `-v`. – Ed Morton Jul 01 '14 at 19:12
1

did not realize FS could be used instead of -F like this – jimh Mar 21 '16 at 04:41

score 1 · Answer 2 · answered Jul 01 '14 at 17:59

1

You need to get awk to re-split $0 after you change FS.

You can do that with $0=$0 (for example).

So {FS=" ";$0=$0;...} in your final block will do what you want.

Though only doing that the first time you need to change FS will likely perform slightly better for large files.

answered Jul 01 '14 at 17:59

Etan Reisner

77,877
8
106
148

@jeanrjc It worked here. What version of `awk` are you using? Did the output change at all when you did that? – Etan Reisner Jul 01 '14 at 18:05
Nothing changed. I'm on a BSD version of awk (Mac user) – jrjc Jul 01 '14 at 18:09
@jeanrjc You ran `awk 'BEGIN{FS=":"}NR==FNR{a[$1]=$2;next}{FS=" ";$0=$0;print $2,a[$2],$1}' file1 file2` and still got your original output? – Etan Reisner Jul 01 '14 at 18:47
yes ! (I copy-pasted what you wrote, and still the same output) – jrjc Jul 01 '14 at 18:49
That's odd. You could try saving `$1`, setting it to something else and then setting it back to the saved value. That might force the re-split in a way that `$0=$0` doesn't but I don't know. You could also try `$1=$1` instead of `$0=$0` and see if that works. – Etan Reisner Jul 01 '14 at 18:54
Do not even bother to try `$1=$1` as that recompiles the current record using the value of OFS, not FS. Assigning `$0=$0` IS the correct way to cause the record to be resplit using the current `FS` value. If that doesn't do that for the OP then his awk is broken. – Ed Morton Jul 01 '14 at 19:06
@EdMorton Thanks for the correction on `$1=$1`. Also for the confirmation that `$0=$0` should work (even in BSD awk). – Etan Reisner Jul 01 '14 at 22:08
@EdMorton : That's odd. I tried on another computer (still Mac), and it still doesn't work. Does it work on a Mac computer for you ? – jrjc Jul 02 '14 at 07:04
I don't have a Mac. The default awk on Macs is broken in other ways so maybe it's broken in this way too? – Ed Morton Jul 02 '14 at 14:30

score 1 · Answer 3 · answered Jul 01 '14 at 18:01

1

You can try something like:

$ cat f1
A:10
B:5
C:12

$  cat f2
100 A
50 B
42 C

$ awk 'NR==FNR{split($0,tmp,/:/);a[tmp[1]]=tmp[2];next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42

or set multiple field separators

$ awk -F"[: ]" 'NR==FNR{a[$1]=$2;next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42

answered Jul 01 '14 at 18:01

jaypal singh

74,723
23
102
147

1

`2 in a` is not the same as his original script. It will omit output that the original would have produced when `file2` contains lines that do not appear in `file1`. – Etan Reisner Jul 01 '14 at 18:07
@EtanReisner Thats correct, but it's often the right way when these kind of implementations are pursued. OP didn't specify if he wanted to keep the lines or not, but I appreciate your feedback as it could help OP make the decision to either leave it or remove it as he feels fit. – jaypal singh Jul 01 '14 at 18:15
I'm not sure to understand what you meant, but both file have same keys (`A`,`B,`C`). With the multiple field sperator, won't it be a problem if there is `:` in the second file ? – jrjc Jul 01 '14 at 18:30
@jeanrjc Yes there would be a problem. In your sample data, you don't seem to list that clearly that multiple field separators are present in **both** files. I would recommend you adding some sample data that truly represents your files. – jaypal singh Jul 01 '14 at 18:33
Ok, and can you explain `$2 in a` ? – jrjc Jul 01 '14 at 18:43
@jeanrjc That checks whether the value of `$2` is in the array `a` and then only prints out the line if that is true. So, as I indicated in my first comment, it will only print out lines from `file2` when their second field also appeared in the first field of `file1`. – Etan Reisner Jul 01 '14 at 18:46
the split solution works fine with my real data, the file3 was more because I wanted a more general solution. – jrjc Jul 01 '14 at 18:51
@jeanrjc Store `$0` in the `a` array and then print out `$0 a[t[2]]` in the second block? – Etan Reisner Jul 01 '14 at 18:54

Processing 2 files with different field separators using awk

3 Answers3

Linked