53

Suppose we have this data file.

john 32 maketing executive
jack 41 chief technical officer
jim  27 developer
dela 33 assistant risk management officer

I want to print using awk

john maketing executive
jack chief technical officer
jim  developer
dela assistant risk management officer

I know it can be done using for.

awk '{printf $1;  for(i=3;i<=NF;i++){printf " %s", $i} printf "\n"}' < file

Problem is its long and looks complex.

Is there any other short way to print rest of the fields.

Katie Kilian
  • 6,815
  • 5
  • 41
  • 64
Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187

7 Answers7

71

Set the field(s) you want to skip to blank:

awk '{$2 = ""; print $0;}' < file_name

Source: Using awk to print all columns from the nth to the last

Community
  • 1
  • 1
Barun
  • 2,542
  • 33
  • 35
9

Reliably with GNU awk for gensub() when using the default FS:

$ gawk -v delNr=2 '{$0=gensub("^([[:space:]]*([^[:space:]]+[[:space:]]+){"delNr-1"})[^[:space:]]+[[:space:]]*","\\1","")}1' file
john maketing executive
jack chief technical officer
jim  developer
dela assistant risk management officer

With other awks, you need to use match() and substr() instead of gensub(). Note that the variable delNr above tells awk which field you want to delete:

$ gawk -v delNr=3 '{$0=gensub("^([[:space:]]*([^[:space:]]+[[:space:]]+){"delNr-1"})[^[:space:]]+[[:space:]]*","\\1","")}1' file
john 32 executive
jack 41 technical officer
jim  27
dela 33 risk management officer

Do not do this:

awk '{sub($2 OFS, "")}1'

as the same text that's in $2 might be at the end of $1, and/or $2 might contain RE metacharacters so there's a very good chance that you'll remove the wrong string that way.

Do not do this:

awk '{$2=""}1' file

as it adds an FS and will compress all other contiguous white space between fields into a single blank char each.

Do not do this:

awk '{$2="";sub("  "," ")}1' file

as it hasthe space-compression issue mentioned above and relies on a hard-coded FS of a single blank (the default, though, so maybe not so bad) but more importantly if there were spaces before $1 it would remove one of those instead of the space it's adding between $1 and $2.

One last thing worth mentioning is that in recent versions of gawk there is a new function named patsplit() which works like split() BUT in addition to creating an array of the fields, it also creates an array of the spaces between the fields. What that means is that you can manipulate fields and the spaces between then within the arrays so you don't have to worry about awk recompiling the record using OFS if you manipulate a field. Then you just have to print the fields you want from the arrays. See patsplit() in http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions for more info.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 2
    Looking at these complications one wonders whether awk is indeed the best tool for this job. e.g. if fields are delimited by pipe or comma then whole awk code needs to be rewritten. – anubhava Aug 27 '13 at 13:54
  • Depends on your input. If you have single chars between fields then `cut` is better. If you have anything else then gawk+gensub() or sed (very similar syntactically) might be the best options. Both of those can run into problems when trying to describe the negation of multi-char REs so then you need to look at gawk+patsplit() or gawk+FPAT. No silver bullet unfortunately. – Ed Morton Aug 27 '13 at 13:57
  • Great answer I wish I could +2 you. One problem is the code is much longer than `for` loop solution. f – Shiplu Mokaddim Aug 27 '13 at 17:13
  • @shiplu.mokadd.im - correct but it preserves the original white space whereas the for loop you posted will not produce the output you specified. By the way, wrt that for loop you posted - never use printf with input data, e.g. `printf $1` as that will fail spectacularly if your input data contains printf formatting characters such as `%`. Always use `printf "%s",$1` for printing input data instead. Also to print a newline is just `print ""`, no need for `printf "\n"`. – Ed Morton Aug 27 '13 at 18:42
6

You can use simple awk like this:

awk '{$2=""}1' file

However this will have an extra OFS in your output that can be avoided by this awk

awk '{sub($2 OFS, "")}1' file

OR else by using this tr and cut combo:

On Linux:

tr -s ' ' < file | cut -d ' ' -f1,f3-

On OSX:

tr -s ' ' < file | cut -d ' ' -f1 -f3-
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • This should be `cut -d' ' -f1,3-`. – Adrian Frühwirth Aug 27 '13 at 05:57
  • @AdrianFrühwirth: Thanks but `cut -f1,3-` is not portable and isn't supported on my OSX. – anubhava Aug 27 '13 at 06:04
  • Then the OSX `cut` is broken. [POSIX](http://pubs.opengroup.org/onlinepubs/007904975/utilities/cut.html) says that *The application shall ensure that the option-argument list (see options -b, -c, and -f below) is a comma-separated list [...]* and `-f -f` does break on my Linux with `coreutils-8.16` with the error message *cut: only one type of list may be specified*. – Adrian Frühwirth Aug 27 '13 at 06:06
  • I will provide both options with my comments above. Though I would prefer `awk '{sub($2 OFS, "")}1' file` as a solution for this problem. – anubhava Aug 27 '13 at 06:07
  • Your edit is not quite correct ;-) Is that why it did not work for you? – Adrian Frühwirth Aug 27 '13 at 06:09
  • @AdrianFrühwirth: No that was typo in my edit. I did try `cut -d ' ' -f1,f3-` and got `cut: [-cf] list: illegal list value` on OSX. – anubhava Aug 27 '13 at 06:26
  • Your mistake is still there, it is `-f1,3-` and not `-f1,f3-`. – Adrian Frühwirth Aug 27 '13 at 06:31
  • 1
    You shouldn't use `awk '{sub($2 OFS, "")}1'` since the same text that's in $2 might be at the end of $1, and/or $2 might contain RE metacharacters so there's a very good chance that you'll remove the wrong string that way. – Ed Morton Aug 27 '13 at 12:06
  • @EdMorton: Valid point about regex. I somehow thought if `/str/` form isn't used in `sub` then its treated as literal text rather than as regex. – anubhava Aug 27 '13 at 12:35
  • 2
    @anubhava - no, the only awk function that looks for strings rather than REs in another string is index(). – Ed Morton Aug 27 '13 at 12:46
  • Thanks @EdMorton So looks like there is no simple way to delete a field with trailing OFS. (`index`+`substr` is the only possible way I think) – anubhava Aug 27 '13 at 13:41
  • 1
    @anubhava - correct there's no simple way but see my answer for a robust way. – Ed Morton Aug 27 '13 at 13:50
4

This removes filed #2 and cleans up the extra space.

awk '{$2="";sub("  "," ")}1' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
3

Another way is to just use sed to replace the first digits and space match:

sed 's|[0-9]\+\s\+||' file

konsolebox
  • 72,135
  • 12
  • 99
  • 105
0

Approach using awk that would not require gawk or any state mutations:

awk '{print $1 " " substr($0, index($0, $3));}' datafile

UPD

solution that is a bit longer, but will stand up the case when $1 or $2 contains $3:

awk '{print $1 " " substr($0, length($1 $2) + 1);}' data

Or even more robust if you have custom field separator:

awk '{print $1 " " substr($0, length($1 FS $2 FS) + 1);}' data
Zapko
  • 2,461
  • 25
  • 30
-1

Do not use altering $n. If you have more spaces in some part you want to keep, it will reduce to one.