How can I get the second column of a very large csv file using linux command?

Question

I was given this question during an interview. I said I could do it with java or python like xreadlines() function to traverse the whole file and fetch the column, but the interviewer wanted me to just use linux cmd. How can I achieve that?

given the brevity here, i'm giving you a complete answer the laziest way possible: first go here http://stackoverflow.com/questions/1521462/looping-through-the-content-of-a-file-in-bash then go here http://stackoverflow.com/questions/19737675/shell-script-how-to-extract-string-using-regular-expressions and use `^.*,(.*),.*$` (or something to that effect) for the regEx — Deryck, May 11 '16 at 02:59

Andreas DM · Accepted Answer · 2016-05-12T03:08:37.323

6

You can use the command awk.

Below is an example of printing out the second column of a file:

awk -F, '{print $2}' file.txt

And to store it, you redirect it into a file:

awk -F, '{print $2}' file.txt > output.txt

edited May 12 '16 at 03:08

answered May 11 '16 at 04:21

Andreas DM

10,685
6
35
62

Thanks, while this solves the problem of csv file, how about 'large' point? Can I output the column to a file using awk? – Pythoner May 11 '16 at 14:48
1

Standard I/O redirection to make a file of the results. `awk -F, '{print $2}' file.txt > /some/file/path` – Jason Lee Eaton May 11 '16 at 17:08
1

The above solutions using awk won't work without the -F flag. Awk splits on white space by default, not commas. – davlet May 11 '16 at 22:29
@PythonNewHand all of these methods (awk or cut) process the input file line by line, and are as fast as they can be. So yes they are perfectly suited for "large" files. – davlet May 12 '16 at 02:50

score 2 · Answer 2 · answered May 11 '16 at 03:59

2

You can use cut:

cut -d, -f2 /path/to/csv/file

answered May 11 '16 at 03:59

davlet

527
3
12

score 1 · Answer 3 · answered May 11 '16 at 04:25

I'd add to Andreas answer, but can't comment yet.

With csv, you have to give awk a field seperator argument, or it will define fields bound by whitespace instead of commas. (Obviously, csv that uses a different field seperator will need a different character to be declared.)

awk -F, '{print $2}' file.txt

How can I get the second column of a very large csv file using linux command?

3 Answers3