112

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this:

column1,column2,column3,column4,column5,column6

How would I print out 3rd column using awk?

Luke Girvin
  • 13,221
  • 9
  • 64
  • 84
user3364728
  • 1,147
  • 2
  • 7
  • 5
  • 1
    Why you would like to use `awk`? IMHO this is a very simple problem. Do You have any attemt to solve it? – TrueY Nov 10 '14 at 12:17
  • Possible duplicate of [Extract specific columns from delimited file using Awk](https://stackoverflow.com/q/7857090/608639), [How to print a range of columns in a CSV in AWK?](https://stackoverflow.com/q/25461806/608639), [How to extract one column of a csv file](https://stackoverflow.com/q/19602181/608639), etc. – jww May 10 '18 at 10:02

4 Answers4

179

Try:

awk -F',' '{print $3}' myfile.txt

Here in -F you are saying to awk that use , as the field separator.

SMA
  • 36,381
  • 8
  • 49
  • 73
  • I have been browsing through a lot of pages and got a lot more of results and this was by far the best :) Thank you – AJC Sep 21 '17 at 17:11
47

If your only requirement is to print the third field of every line, with each field delimited by a comma, you can use cut:

cut -d, -f3 file
  • -d, sets the delimiter to a comma
  • -f3 specifies that only the third field is to be printed
Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • 1
    This is the best answer for this question. `awk` comes in very handy when lets say I want to print `[col1]:[col5]` with different dels and a different formatting – Arijoon Sep 28 '17 at 11:41
30

Try this awk

awk -F, '{$0=$3}1' file
column3
  • , Divide fields by ,
  • $0=$3 Set the line to only field 3
  • 1 Print all out. (explained here)

This could also be used:

awk -F, '{print $3}' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • 5
    As well as being shorter, this is much more difficult for someone unfamiliar with awk to understand. It would be worth adding some explanation to make this answer more useful. – Tom Fenech Nov 10 '14 at 11:33
  • 1
    +1. A little bit cryptic, but works like a Schaffhausen. – TrueY Nov 10 '14 at 12:06
  • 1
    @TomFenech: I think `cut -d, -f3 file` is as cryptic as this one if someone is unfamiliar with `cut`. ;) – TrueY Nov 10 '14 at 12:07
  • 3
    @TrueY granted, although one difference is that `cut --help` would explain everything that you needed to know, whereas `awk --help` wouldn't. Perhaps I should've gone for `cut --delimiter=, --fields=3 file`, although I have my doubts that the longer switches are portable :) – Tom Fenech Nov 10 '14 at 12:12
3

A simple, although -less solution in :

while IFS=, read -r a a a b; do echo "$a"; done <inputfile

It works faster for small files (<100 lines) then as it uses less resources (avoids calling the expensive fork and execve system calls).

EDIT from Ed Morton (sorry for hi-jacking the answer, I don't know if there's a better way to address this):

To put to rest the myth that shell will run faster than awk for small files:

$ wc -l file
99 file

$ time while IFS=, read -r a a a b; do echo "$a"; done <file >/dev/null

real    0m0.016s
user    0m0.000s
sys     0m0.015s

$ time awk -F, '{print $3}' file >/dev/null

real    0m0.016s
user    0m0.000s
sys     0m0.015s

I expect if you get a REALY small enough file then you will see the shell script run in a fraction of a blink of an eye faster than the awk script but who cares?

And if you don't believe that it's harder to write robust shell scripts than awk scripts, look at this bug in the shell script you posted:

$ cat file
a,b,-e,d
$ cut -d, -f3 file
-e
$ awk -F, '{print $3}' file
-e
$ while IFS=, read -r a a a b; do echo "$a"; done <file

$
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
TrueY
  • 7,360
  • 1
  • 41
  • 46
  • 1
    `While read` loops are singificantly slower than awk, even if it were quicker with tiny files, the speed difference would be neglible. –  Nov 10 '14 at 12:35
  • @Jidder: You are right! IMHO that's why it is pointless to use [tag:awk] for small files. – TrueY Nov 10 '14 at 13:12
  • I doubt if it runs faster than awk for small files and whichever is faster the difference would be negligible so why write all of that when you can simply write the much clearer, briefer, more extensible, more robust and more portable `awk -F, '{print $3}' file`? Avoiding calling "external" tools is not a goal of shell programming - the whole reason the shell exists is to be the glue to sequence calls to "external" tools. – Ed Morton Nov 10 '14 at 13:46
  • I edited your answer to show timing stats for your shell script and an awk script working on a 99-line file. As you can see they are identical. The point is though that performance only matters for large files and we can all agree that awk is faster for that, and the awk script is superior to the bash script in every other important way so there is absolutely no reason to write a long, complicated, error-prone, non-portable shell script to try to improve performance for a job that where performance simply is not an issue. – Ed Morton Nov 10 '14 at 13:59
  • 1
    @EdMorton: You are right. For bigger and more complex problems I definitely use [tag:awk] (more often called from a [tag:bash] script), but for small files and simple tasks I use pure [tag:bash]. On the other hand for even more complex jobs I prefer [tag:perl] for scripting. – TrueY Nov 10 '14 at 14:06
  • @EdMorton: "Avoiding calling "external" tools is not a goal of shell programming" - I fully agree with that. Bit to minimize the usage of resources can be good. Mostly if a scrip is run frequently. – TrueY Nov 10 '14 at 14:08
  • If someone posted a question like "I need to parse an input file that's 10 lines long and my awk script running in a fraction of the blink of an eye is too slow, how can I optimize it?", THEN maybe there'd be a reason to discuss re-writing the script in shell but that almost certainly wouldn't be the right approach then either. When someone says "how do I select column 3 from a file", though, then posting a shell loop is a long way from the right approach when there are shell tools that exist to do the job better in every measurable way for the general case and the OP may not know that. – Ed Morton Nov 10 '14 at 14:16
  • btw to show what I mean by shell language being hard to use correctly for text processing, I also edited your question to demonstrate a bug in your script. – Ed Morton Nov 10 '14 at 14:17
  • 1
    @EdMorton: You are right again and it seems that with `echo` it cannot be solved easily. In that case one can use the `printf "%s\n" "$a"` to get rid of that. This also a [tag:bash] build-in. – TrueY Nov 10 '14 at 14:29
  • Right but notice that every time you write a shell loop to manipulate text you need to remember to a) set `IFS=` to stop shell stripping white space even when white space is the intended delimiter, b) use `-r` with read to stop it interpreting backslashes, c) use `printf` instead of `echo` to avoid some values being interpreted by echo, d) add double quotes around your variables to avoid globbing and file name expansion, e) etc. The reason is that shells default behavior in all cases is for shells purpose - sequencing calls to tools and manipulating files and processes, NOT manipulating text. – Ed Morton Nov 10 '14 at 14:35
  • 1
    @EdMorton: You are right! If someone uses any tool he/she needs to know the possible problems. So I suggest to anyone to use an adequate tool. If I write a kernel module I definitely shell not use [tag:bash]. ;) But for such an itsy-bitsy task it may be enough. – TrueY Nov 10 '14 at 16:39
  • (Hihi. I made a typo: I shell not ...! ;) ) – TrueY Nov 11 '14 at 12:19