1

I'm trying to print specific fields in this CSV file using awk but I'm running into an issue where some of the lines contain commas but they aren't new fields. For example, the following line is no problem for me.

ABAKEV,InChI=1S/C10H7NO/c12-7-9-6-5-8-3-1-2-4-10(8)11-9/h1-7H,8,2,H7C10ON,1562.9152

I use:

awk -F "," '{print $1,$3,$5,$6}'

which gives me my desired result:

ABAKEV 8 H7C10ON 1562.9152

However, when there are some lines which contain commas within brackets that are supposed to belong to the second field. For example:

ACEMID03,InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4),18,1,H5C2ON,1491.2031,-,308.5,158.19,CC(=O)N,10.87831,3.89183,54.21

Specifically,

(H2,3,4)

My desired result is:

ACEMID03 18 H5C2ON 1491.2031

Does anyone have any ideas for how I can break this up the way I want to? Preferably I'd like to use awk because I'm more familiar with it. If someone else has any quick solutions, please let me know. Thanks!

Mudkip123
  • 13
  • 3

1 Answers1

0

Using GNU awk for FPAT to identify fields:

$ awk -v FPAT='[^,]+|[(][^()]+)' '{for (i=1; i<=NF; i++) print i, $i}' file
1 ACEMID03
2 InChI=1S/C2H5NO/c1-2(3)4/h1H3
3 (H2,3,4)
4 18
5 1
6 H5C2ON
7 1491.2031
8 -
9 308.5
10 158.19
11 CC(=O)N
12 10.87831
13 3.89183
14 54.21

.

$ awk -v FPAT='[^,]+|[(][^()]+)' '{print $1,$3,$5,$6}' file
ACEMID03 (H2,3,4) 1 H5C2ON

See also What's the most robust way to efficiently parse CSV using awk?.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Do you think there is anyway I can generalize this solution? Field 3 on one line might be field 5 (for example) on another line using this method. I want to be able to `awk '{print $1,$2}'` and all have fields correspond to the same header. – Mudkip123 Mar 31 '20 at 00:08
  • How could field 3 on one line be field 5 on another within one file? Or are you asking how to print fields by the column name from the first line of different files? If so that's been asked and answered several times on this forum, just search the archives here or google it. If you don't find it then ask a new question about that. – Ed Morton Mar 31 '20 at 00:38