279

I have a file which contain following lines:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com). I am getting the following output:

cat file | awk -F'/' '{print $3 "\t" $5}'
tc0001   tomcat7.1
tc0001   tomcat7.2
tc0001   tomcat7.5

How do I also extract last field with domain name which is after '='? How do I use multiple delimiter to extract field?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Satish
  • 16,544
  • 29
  • 93
  • 149
  • 2
    To answer my question which is same same but different, `awk` was swallowing fields when they were blank which foobarred the field numbering. I changed `-F " "` to `-F "[ ]"` and `awk` didn't swallow the empty fields anymore. – Adam Apr 26 '17 at 15:29

8 Answers8

452

The delimiter can be a regular expression.

awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file

Produces:

tc0001   tomcat7.1    demo.example.com  
tc0001   tomcat7.2    quest.example.com  
tc0001   tomcat7.5    www.example.com
fedorqui
  • 275,237
  • 103
  • 548
  • 598
embedded.kyle
  • 10,976
  • 5
  • 37
  • 56
  • 50
    Of course, `cat` process is not required: `awk '...' file`. Also, it would be tidier to use the output field separator: `awk -F'[/=]' -v OFS="\t" '{print $3, $5, $8}'` – glenn jackman Aug 30 '12 at 22:02
  • 22
    Awk delimiters can be regular expressions... this made my day! – das.cyklone Apr 03 '14 at 16:32
  • 5
    @das.cyklone: awk can also have several separators, with `|` : ex: `awk -F 'this|that|[=/]' '......'` (usefull to have words/strings separating things) (note that this keeps the spaces in the fiels between 2 separators. Adding also `|[ \t]+` can be useful, but can make things tricky ... as there are often spaces before and after 'this', this will make 2 extra empty field appear in between the space(s) and 'this') – Olivier Dulac Oct 15 '14 at 13:36
  • I've tried this on 2 different distros and I get the same behavior: I want to get the port from netstat -ntpl "netstat -ntpl |sed 's/:/ /' |awk '{print $5}' " works but could do without doulbe piping This works but I was not expecting the data on field 17: "netstat -ntpl |awk -F" |:" '{print $17}'" – louigi600 May 10 '17 at 14:29
  • @louigi600 The problem is your delimiter expression `" |:"` is splitting on _every_ space character as well as the colon. Which is why your data is on field 17. If you split on _groups_ of spaces, your port will be in field 5 as expected. `netstat -ntpl | awk -F " *|:" '{print $5}'` – embedded.kyle May 10 '17 at 17:52
  • 3
    yes ... this got me what I wanted: awk -F"[ :]+" '/\/postmaster *$/ {print $5}' – louigi600 May 11 '17 at 07:47
  • In case of, delimiters with 2 characters like -- "^|", then how we could handled it. – user3040157 May 07 '20 at 13:36
  • @user3040157 : `awk -F'[|^]' '…..'` …. but when you say `"^|"`, do you want to match a pipe at start of line, or actually match caret+pipe in the middle of a line ? – RARE Kpop Manifesto Dec 19 '22 at 06:26
  • @RAREKpopManifesto: actually match caret+pipe in the middle of a line. – user3040157 Dec 20 '22 at 07:15
  • @user3040157 then why not just `FS = "[|]\\^"` – RARE Kpop Manifesto Dec 21 '22 at 08:55
74

Good news! awk field separator can be a regular expression. You just need to use -F"<separator1>|<separator2>|...":

awk -F"/|=" -vOFS='\t' '{print $3, $5, $NF}' file

Returns:

tc0001  tomcat7.1  demo.example.com
tc0001  tomcat7.2  quest.example.com
tc0001  tomcat7.5  www.example.com

Here:

  • -F"/|=" sets the input field separator to either / or =.

  • -vOFS='\t' is using the -v flag for setting a variable. OFS is the default variable for the Output Field Separator and it is set to the tab character. The flag is necessary because there is no built-in for the OFS like -F.

  • {print $3, $5, $NF} prints the 3rd, 5th and last fields based on the input field separator.


See another example:

$ cat file
hello#how_are_you
i#am_very#well_thank#you

This file has two fields separators, # and _. If we want to print the second field regardless of the separator being one or the other, let's make both be separators!

$ awk -F"#|_" '{print $2}' file
how
am

Where the files are numbered as follows:

hello#how_are_you           i#am_very#well_thank#you
^^^^^ ^^^ ^^^ ^^^           ^ ^^ ^^^^ ^^^^ ^^^^^ ^^^
  1    2   3   4            1  2   3    4    5    6
BUFU
  • 127
  • 11
fedorqui
  • 275,237
  • 103
  • 548
  • 598
10

Another one is to use the -F option but pass it regex to print the text between left and or right parenthesis ().

The file content:

528(smbw)
529(smbt)
530(smbn)
10115(smbs)

The command:

awk -F"[()]" '{print $2}' filename

result:

smbw
smbt
smbn
smbs

Using awk to just print the text between []:

Use awk -F'[][]' but awk -F'[[]]' will not work.

http://stanlo45.blogspot.com/2020/06/awk-multiple-field-separators.html

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Stan Lovisa
  • 101
  • 1
  • 2
  • 3
    Your answer came up in the deletion queue because 9 times out of 10, users with 1 reputation linking to their own blog usually is spam. But yours is the exception to the rule. The last 10 years of content there is a gold mine, hopefully you have a plan to immortalize it. – Eric Leschinski Jun 18 '20 at 15:00
6

If your whitespace is consistent you could use that as a delimiter, also instead of inserting \t directly, you could set the output separator and it will be included automatically:

< file awk -v OFS='\t' -v FS='[/ ]' '{print $3, $5, $NF}'
Thor
  • 45,082
  • 11
  • 119
  • 130
  • you can skip the `-v ` portion by placing both `FS=..` and `OFS=…` on the right of the code block (the same place where data files are listed). Despite it being on the RHS, their assignment would still be in time for first data row, since you don't have a `BEGIN { }` block that requires additional handling of them. – RARE Kpop Manifesto Jul 08 '23 at 15:21
5

For a field separator of any number 2 through 5 or letter a or # or a space, where the separating character must be repeated at least 2 times and not more than 6 times, for example:

awk -F'[2-5a# ]{2,6}' ...

I am sure variations of this exist using ( ) and parameters

Michael Jaros
  • 4,586
  • 1
  • 22
  • 39
genome
  • 51
  • 1
  • 1
3

Perl one-liner:

perl -F'/[\/=]/' -lane 'print "$F[2]\t$F[4]\t$F[7]"' file

These command-line options are used:

  • -n loop around every line of the input file, put the line in the $_ variable, do not automatically print every line

  • -l removes newlines before processing, and adds them back in afterwards

  • -a autosplit mode – perl will automatically split input lines into the @F array. Defaults to splitting on whitespace

  • -F autosplit modifier, in this example splits on either / or =

  • -e execute the perl code

Perl is closely related to awk, however, the @F autosplit array starts at index $F[0] while awk fields start with $1.

Chris Koknat
  • 3,305
  • 2
  • 29
  • 30
0

I see many perfect answers are up on the board, but still would like to upload my piece of code too,

awk -F"/" '{print $3 " " $5 " " $7}' sam | sed 's/ cat.* =//g'

Sadhun
  • 264
  • 5
  • 14
  • 3
    `print $3 " " $5 " " $7` can be printed just as `print $3, $5, $7`. Also, I don't see the advantage of using awk and then piping to sed. In general, awk can suffice and others answer show that. – fedorqui Feb 25 '15 at 14:50
0

Using Raku (formerly known as Perl_6)

raku -ne '.split(/ <[/=]> /).[2,4,7].put;'

Sample Input:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

Sample Output:

tc0001 tomcat7.1  demo.example.com
tc0001 tomcat7.2  quest.example.com
tc0001 tomcat7.5  www.example.com

Above is a solution coded in Raku, a member of the Perl-family of programming languages. Briefly, input in read linewise with the -ne (linewise, non-autoprinting) commandline flags. Lines are split on a regex which consists of a custom character class (/=) created with the <[ ]> operator. Elements [2,4,7] are then put to give the results above.

Of course, the above is a 'bare-bones' implementation, and Raku being a Perl-family language, TMTOWTDI applies. So lines can be split on literal characters separated by a | "OR" operator. Element numbering (which is zero-indexed in both Perl and Raku) can be tightened up adding the :skip-empty adverb to the split routine. Whitespace can be trim-med away from each element (using map), and the desired elements (now [1,3,6]) are join-ed with \t tabs, giving the following result:

raku -ne '.split(/ "/" | "=" /, :skip-empty).map(*.trim).[1,3,6].join("\t").put;' file
tc0001  tomcat7.1   demo.example.com
tc0001  tomcat7.2   quest.example.com
tc0001  tomcat7.5   www.example.com

https://raku.org

jubilatious1
  • 1,999
  • 10
  • 18