Get Rows based on column value from csv

Question

I have a csv with the below data

10.000.00.00,D3,1
10.001.00.00,C4,2
10.002.00.00,C5,2
10.000.88.99,B1,3
10.000.00.00,B2,3
10.000.00.00,C6,3
10.000.99.00,D1,3

tried below code

cat Data.csv | awk -F , '$3 == "3" { print }'

Need to get only the rows having last values as 3.

Please let me know how to do this

What's wrong with your code? Your code does exactly what it's supposed to do. Maybe a little awkward. — Cyrus, Apr 01 '19 at 17:19
Assuming your posted code doesn't do what you expected - your input file has DOS line-endings. Use `cat -v Data.csv` to see them and then `dos2unix` or similar to remove them. See https://stackoverflow.com/a/45772568/1745001 for details. — Ed Morton, Apr 01 '19 at 17:25
@Sandy: To avoid the described problem, you can append the following to your mawk or gawk command to handle DOS and Unix line-endings: `-v RS='\n|\r\n'` — Cyrus, Apr 01 '19 at 17:30
@Cyrus `'\n|\r\n'` = `\r?\n`. That approach will fail of course if there truly are supposed to be DOS line endings such as from an Excel export to a CSV where lines end in `\r\n` but can contain `\n`s inside quoted fields. — Ed Morton, Apr 01 '19 at 17:45

James Brown · Accepted Answer · 2019-04-02T09:06:29.813

6

Using awk to get only the rows having last values as 3:

$ awk -F, '$NF==3' file
10.000.88.99,B1,3
10.000.00.00,B2,3
10.000.00.00,C6,3
10.000.99.00,D1,3

Explained:

awk -F, '  # set the field separator to a comma
$NF==3     # NF is the last field, $NF last field value (see comments for more
' file                                                  #thanks @kvantour)

edited Apr 02 '19 at 09:06

answered Apr 01 '19 at 16:51

James Brown

36,089
7
43
59

1

The reason this works is because **(a)** a numeric comparison is enforced as fields are both numeric and string at the same time **(b)** `$NF` is converted to a numeric value using [`strod`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtod.html) and the latter ignores unrecognized characters (such as `\r`) – kvantour Apr 02 '19 at 06:41

ctac_ · Answer 2 · 2019-04-01T18:02:29.423

2

You can try with sed :

sed '/,3$/!d' infile

If you can have \r at end of lines, try this way :

sed '/,3\r*$/!d' infile

edited Apr 01 '19 at 18:02

answered Apr 01 '19 at 17:57

ctac_

2,413
2
7
17

Allan · Answer 3 · 2019-04-02T07:42:11.233

Why do we need awk or sed for this kind of operations in the first place??? Isn't it an overkill?

OP is asking about extracting some lines meeting a specific condition from the file without even modifying their format...

grep is THE perfect tool for this.

$ grep ',3$' Data.csv 
10.000.88.99,B1,3
10.000.00.00,B2,3
10.000.00.00,C6,3
10.000.99.00,D1,3

Eventually grep -E ',3\r?$' Data.csv if you have windows EOLs.

Also try avoiding as much as possible cat <FILE> | <COMMAND>, instead pass directly the file to the command or redirect the stdin from the file to the command (Command < file).

score 0 · Answer 4 · answered Apr 02 '19 at 08:58

you can use built in awk variable for this.

in our case

'$NF' - NF is for the number of fields in the current record

awk -F, '{if($NF == 3) {print $0} }' Data.csv
10.000.88.99,B1,3
10.000.00.00,B2,3
10.000.00.00,C6,3
10.000.99.00,D1,3

You can learn more about built in varible at following link: Awk Built in Variables

Get Rows based on column value from csv

4 Answers4

Linked