2

I run two awk command consecutively to break down a string based on multiple delimiters. I am wondering if they can be combined into a single command.

Input data (jot -w "some string, this is number " 10):

some string, this is number 1
some string, this is number 2
some string, this is number 3
some string, this is number 4
some string, this is number 5
some string, this is number 6
some string, this is number 7
some string, this is number 8
some string, this is number 9
some string, this is number 10

This is just example data, but I want to be able to split the string first based on the comma and then extract the number (fourth word) from the second part. In practice, the number of spaces in the first part of the string could vary, i.e. the following would be valid input:

some string, this is number 1
some string with more spaces, this is number 2

The following command works fine:

$ jot -w "some string, this is number " 10 | awk -F ',' '{print $2}' | awk -F ' ' '{print $4}'
1
2
3
4
5
6
7
8
9
10

Is there any simple way to combine both these commands into a single one?

zelanix
  • 3,326
  • 1
  • 25
  • 35
  • `awk` takes regular expressions as delimiters, but if the part before the comma can have different numbers of fields, you might have to run two `awk`s. Perhaps you could use `cut` for the "simpler" part? – Jasper May 05 '14 at 12:58
  • @jasper `cut` is definitely an option, thanks, and yes, I was aware that `awk` can take a regex as the delimiter, but even if the number of spaces doesn't vary before the comma, doing two consecutive awk statements is (in my application) more readable than having to change the index of the field in the second program. Thanks for the comment though. – zelanix May 05 '14 at 13:06
  • Possible duplicate of [AWK multiple delimiter](https://stackoverflow.com/q/12204192/608639) – jww Aug 15 '18 at 23:18

3 Answers3

3

The split() function will let you do this:

awk '{split($0,a,",");split(a[2],b," ");print b[4];}'
Vaughn Cato
  • 63,448
  • 5
  • 82
  • 132
  • 1
    +1, but should probably explicitly state that one would normally write: `awk '{split($2,a," "); print a[4]}' FS=,` and use the normal field splitting for the first delimiter. – William Pursell May 05 '14 at 13:38
2

You can use NF to print the last column easily

jot -w "some string, this is number " 10 |awk '{print $NF}'

Or follow your idea, and merge two awk into one.

jot -w "some string, this is number " 10  |awk '{l=split($2,a,OFS);print a[l]}' FS="," 
BMW
  • 42,880
  • 12
  • 99
  • 116
1

To solve the problem you describe would be:

$ cat file
some string, this is number 1
some string with more spaces, this is number 2

$ awk -F, '{n=split($NF,a,/ /); print a[n]}' file
1
2

or if you like golf:

$ awk -F, '{print a[split($NF,a,/ /)]}' file
1
2

but obviously with the input you specified this would work:

$ awk '{print $NF}' file
1
2

as would various other solutions.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Since this answer is accepted, then I have to give some suggestions on it. the split function with `/ / ` in most case is fine, but not always, recommend to change to OFS (because FS has been used and set to other value.) – BMW May 09 '14 at 00:32
  • The 3rd arg for `split` is a field separator which is a regexp with special handling for a single blank char, `" "`. As such the correct delimiters for an RE constant are the RE delimiters of `/.../` for both clarity and functionality (e.g. if you used string delimiters `"..."` then you'd need to double-escape any RE metacharacters to have them treated as literals). Splitting Input using the Output Field Separator just because it co-incidentally happens to be set to the same character that you want to split() your input on would just be unnecessary coupling and obfuscation. – Ed Morton May 09 '14 at 01:39