grep a substring in a column and print the rows that contain that substring in that column

Question

example:

a,bee,a bee
c,fee,c dee
e,dee,e bee 
g,hee,d deen
h,aee,t Dee

if the block above is a 3x5 data, I want to print the rows that contain 'dee' in the third column, it should be case insensitive and only find the word that matches exactly (for example 'deen' is not acceptable), output should be as follow:

c,fee,c dee
h,aee,t Dee

how would the Bash command looks like

what I have tried is:

awk -F"," '{print $3}' filename | grep -iw 'dee'

but I still need the data at column 2.

Welcome to Stack Overflow (SO). [SO is a question and answer page for professional and enthusiast programmers](https://stackoverflow.com/tour). Please add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself. — Jetchisel, Sep 30 '21 at 12:38
If this is a 3x3 table, does it mean that your separator is the comma `,` e.g. the 1st line has elements "a", "bee" and "a bee" (with a whitespace in the last column)? — vdavid, Sep 30 '21 at 12:41
Is this what you are looking for? https://stackoverflow.com/a/17001897/1581658 (with some toLower added in to ignore case) — SamBob, Sep 30 '21 at 12:55
@SamBob Similar but I want to get the rows that contain the word 'dee' but not the whole field is only 'dee', also need to be case insensitive and exactly match the word — ng zu shen, Sep 30 '21 at 13:02

John Goofy · Answer 1 · 2021-09-30T13:16:46.087

1

Assumed your data is in a file named dat, try this:

sed -ne '/[dD]ee$/p' dat

Or if you like to use awk:

awk '/[dD]ee$/' dat

Or if you like to use grep:

grep -i 'dee$' dat

then the output is

c,fee,c dee
h,aee,t Dee

Try to explore how to use regular expression to match a pattern. In your case your regular expression is [dD]ee$ that matches the pattern dee or Dee at the end $ of any line.

edited Sep 30 '21 at 13:16

answered Sep 30 '21 at 13:03

John Goofy

1,330
1
10
20

I have tried but the output is only 'h,aee,t Dee' – ng zu shen Sep 30 '21 at 13:08

Paul Hodges · Answer 2 · 2021-09-30T13:44:22.087

0

c.f. https://www.gnu.org/software/gawk/manual/html_node/Regexp.html

$: awk -F, 'BEGIN{IGNORECASE=1} $3~/\<dee\>/' file # \< & \> are word boundaries
c,fee,c dee
h,aee,t Dee

Or just with grep -

$: grep -Ei '([^,]+,){2}.*\<dee\>' file # constrain string to 3rd field
c,fee,c dee
h,aee,t Dee

edited Sep 30 '21 at 13:44

answered Sep 30 '21 at 13:38

Paul Hodges

13,382
1
17
36

Both method works! Thank you. But I might have to figure out how to use regular expression first to work with my real data. Thanks nonetheless. – ng zu shen Sep 30 '21 at 14:09
Be careful of the [XY Problem](http://xyproblem.info/) when posting a question. Always try to use as close to real data as possible. If you update your question with examples, we can likely help. – Paul Hodges Sep 30 '21 at 15:03

grep a substring in a column and print the rows that contain that substring in that column

2 Answers2