Using multiple delimiters in awk

Question

I have a file which contain following lines:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com). I am getting the following output:

cat file | awk -F'/' '{print $3 "\t" $5}'
tc0001   tomcat7.1
tc0001   tomcat7.2
tc0001   tomcat7.5

How do I also extract last field with domain name which is after '='? How do I use multiple delimiter to extract field?

To answer my question which is same same but different, `awk` was swallowing fields when they were blank which foobarred the field numbering. I changed `-F " "` to `-F "[ ]"` and `awk` didn't swallow the empty fields anymore. — Adam, Apr 26 '17 at 15:29

score 452 · Accepted Answer · edited Apr 13 '17 at 10:01

452

The delimiter can be a regular expression.

awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file

Produces:

tc0001   tomcat7.1    demo.example.com  
tc0001   tomcat7.2    quest.example.com  
tc0001   tomcat7.5    www.example.com

edited Apr 13 '17 at 10:01

fedorqui

275,237
103
548
598

answered Aug 30 '12 at 19:47

embedded.kyle

10,976
5
37
56

50

Of course, `cat` process is not required: `awk '...' file`. Also, it would be tidier to use the output field separator: `awk -F'[/=]' -v OFS="\t" '{print $3, $5, $8}'` – glenn jackman Aug 30 '12 at 22:02
22

Awk delimiters can be regular expressions... this made my day! – das.cyklone Apr 03 '14 at 16:32
5

@das.cyklone: awk can also have several separators, with `|` : ex: `awk -F 'this|that|[=/]' '......'` (usefull to have words/strings separating things) (note that this keeps the spaces in the fiels between 2 separators. Adding also `|[ \t]+` can be useful, but can make things tricky ... as there are often spaces before and after 'this', this will make 2 extra empty field appear in between the space(s) and 'this') – Olivier Dulac Oct 15 '14 at 13:36
I've tried this on 2 different distros and I get the same behavior: I want to get the port from netstat -ntpl "netstat -ntpl |sed 's/:/ /' |awk '{print $5}' " works but could do without doulbe piping This works but I was not expecting the data on field 17: "netstat -ntpl |awk -F" |:" '{print $17}'" – louigi600 May 10 '17 at 14:29
@louigi600 The problem is your delimiter expression `" |:"` is splitting on _every_ space character as well as the colon. Which is why your data is on field 17. If you split on _groups_ of spaces, your port will be in field 5 as expected. `netstat -ntpl | awk -F " *|:" '{print $5}'` – embedded.kyle May 10 '17 at 17:52
3

yes ... this got me what I wanted: awk -F"[ :]+" '/\/postmaster *$/ {print $5}' – louigi600 May 11 '17 at 07:47
In case of, delimiters with 2 characters like -- "^|", then how we could handled it. – user3040157 May 07 '20 at 13:36
@user3040157 : `awk -F'[|^]' '…..'` …. but when you say `"^|"`, do you want to match a pipe at start of line, or actually match caret+pipe in the middle of a line ? – RARE Kpop Manifesto Dec 19 '22 at 06:26
@RAREKpopManifesto: actually match caret+pipe in the middle of a line. – user3040157 Dec 20 '22 at 07:15
@user3040157 then why not just `FS = "[|]\\^"` – RARE Kpop Manifesto Dec 21 '22 at 08:55

score 74 · Answer 2 · edited Aug 31 '21 at 05:57

Good news! awk field separator can be a regular expression. You just need to use -F"<separator1>|<separator2>|...":

awk -F"/|=" -vOFS='\t' '{print $3, $5, $NF}' file

Returns:

tc0001  tomcat7.1  demo.example.com
tc0001  tomcat7.2  quest.example.com
tc0001  tomcat7.5  www.example.com

Here:

-F"/|=" sets the input field separator to either / or =.
-vOFS='\t' is using the -v flag for setting a variable. OFS is the default variable for the Output Field Separator and it is set to the tab character. The flag is necessary because there is no built-in for the OFS like -F.
{print $3, $5, $NF} prints the 3rd, 5th and last fields based on the input field separator.

See another example:

$ cat file
hello#how_are_you
i#am_very#well_thank#you

This file has two fields separators, # and _. If we want to print the second field regardless of the separator being one or the other, let's make both be separators!

$ awk -F"#|_" '{print $2}' file
how
am

Where the files are numbered as follows:

hello#how_are_you           i#am_very#well_thank#you
^^^^^ ^^^ ^^^ ^^^           ^ ^^ ^^^^ ^^^^ ^^^^^ ^^^
  1    2   3   4            1  2   3    4    5    6

score 10 · Answer 3 · edited Jun 18 '20 at 12:57

10

Another one is to use the -F option but pass it regex to print the text between left and or right parenthesis ().

The file content:

528(smbw)
529(smbt)
530(smbn)
10115(smbs)

The command:

awk -F"[()]" '{print $2}' filename

result:

smbw
smbt
smbn
smbs

Using awk to just print the text between []:

Use awk -F'[][]' but awk -F'[[]]' will not work.

http://stanlo45.blogspot.com/2020/06/awk-multiple-field-separators.html

edited Jun 18 '20 at 12:57

Eric Leschinski

146,994
96
417
335

answered Jun 18 '20 at 09:09

Stan Lovisa

101
1
2

3

Your answer came up in the deletion queue because 9 times out of 10, users with 1 reputation linking to their own blog usually is spam. But yours is the exception to the rule. The last 10 years of content there is a gold mine, hopefully you have a plan to immortalize it. – Eric Leschinski Jun 18 '20 at 15:00

score 6 · Answer 4 · answered Aug 30 '12 at 19:51

6

If your whitespace is consistent you could use that as a delimiter, also instead of inserting \t directly, you could set the output separator and it will be included automatically:

< file awk -v OFS='\t' -v FS='[/ ]' '{print $3, $5, $NF}'

answered Aug 30 '12 at 19:51

Thor

45,082
11
119
130

you can skip the `-v ` portion by placing both `FS=..` and `OFS=…` on the right of the code block (the same place where data files are listed). Despite it being on the RHS, their assignment would still be in time for first data row, since you don't have a `BEGIN { }` block that requires additional handling of them. – RARE Kpop Manifesto Jul 08 '23 at 15:21

score 5 · Answer 5 · edited Mar 22 '15 at 14:52

5

For a field separator of any number 2 through 5 or letter a or # or a space, where the separating character must be repeated at least 2 times and not more than 6 times, for example:

awk -F'[2-5a# ]{2,6}' ...

I am sure variations of this exist using ( ) and parameters

edited Mar 22 '15 at 14:52

Michael Jaros

4,586
1
22
39

answered Mar 22 '15 at 13:50

genome

51
1
1

Chris Koknat · Answer 6 · 2015-10-13T23:38:32.183

3

Perl one-liner:

perl -F'/[\/=]/' -lane 'print "$F[2]\t$F[4]\t$F[7]"' file

These command-line options are used:

-n loop around every line of the input file, put the line in the $_ variable, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – perl will automatically split input lines into the @F array. Defaults to splitting on whitespace
-F autosplit modifier, in this example splits on either / or =
-e execute the perl code

Perl is closely related to awk, however, the @F autosplit array starts at index $F[0] while awk fields start with $1.

edited Oct 13 '15 at 23:38

answered Sep 09 '15 at 16:58

Chris Koknat

3,305
2
29
30

does `perl` allow end user to create an array named `@F` if the `-a` flag wasn't set ? – RARE Kpop Manifesto Jul 08 '23 at 15:17
it should since the @F array isn't special – Chris Koknat Jul 09 '23 at 17:08
ahhh thanks. i just thought `perl` might make it more consistent - since var-args for subroutines are auto split into `@_` then wouldn't auto splitting the main input row `$_` also into a scope-dependent `@_` be more intuitive instead of `@F` ? – RARE Kpop Manifesto Jul 12 '23 at 04:15

score 0 · Answer 7 · answered Feb 25 '15 at 14:38

0

I see many perfect answers are up on the board, but still would like to upload my piece of code too,

awk -F"/" '{print $3 " " $5 " " $7}' sam | sed 's/ cat.* =//g'

answered Feb 25 '15 at 14:38

Sadhun

264
5
14

3

`print $3 " " $5 " " $7` can be printed just as `print $3, $5, $7`. Also, I don't see the advantage of using awk and then piping to sed. In general, awk can suffice and others answer show that. – fedorqui Feb 25 '15 at 14:50

jubilatious1 · Answer 8 · 2021-11-23T19:15:10.293

Using Raku (formerly known as Perl_6)

raku -ne '.split(/ <[/=]> /).[2,4,7].put;'

Sample Input:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

Sample Output:

tc0001 tomcat7.1  demo.example.com
tc0001 tomcat7.2  quest.example.com
tc0001 tomcat7.5  www.example.com

Above is a solution coded in Raku, a member of the Perl-family of programming languages. Briefly, input in read linewise with the -ne (linewise, non-autoprinting) commandline flags. Lines are split on a regex which consists of a custom character class (/=) created with the <[ ]> operator. Elements [2,4,7] are then put to give the results above.

Of course, the above is a 'bare-bones' implementation, and Raku being a Perl-family language, TMTOWTDI applies. So lines can be split on literal characters separated by a | "OR" operator. Element numbering (which is zero-indexed in both Perl and Raku) can be tightened up adding the :skip-empty adverb to the split routine. Whitespace can be trim-med away from each element (using map), and the desired elements (now [1,3,6]) are join-ed with \t tabs, giving the following result:

raku -ne '.split(/ "/" | "=" /, :skip-empty).map(*.trim).[1,3,6].join("\t").put;' file
tc0001  tomcat7.1   demo.example.com
tc0001  tomcat7.2   quest.example.com
tc0001  tomcat7.5   www.example.com

https://raku.org

Using multiple delimiters in awk

8 Answers8

Linked

Related