How to use gawk to print out the 3rd column that is greater than 10 characters regardless of comma in the string

Question

I have a csv file where some of the addresses have a comma in the middle, because of this I can't use

$ awk -F',' 'length($3) >= 10 {print $3}' schools.csv

an example of my data looks like this

id,name,address
"1","paul","103 avenue"
"2","shawn","108 BLVD, SE"
"3","ryan","MLK drive 1004"

as you can see the address for id two has a comma in between so I have to use gawk module 4. So far I've been able to print every row regardless if there is a comma or not but I only want to print the 3rd column(address) that has a field > 10 characters. Here is what I have thus far.

//awk.awk file
    BEGIN {
        FPAT = "([^,]+)|(\"[^\"]+\")"
    }
    
    {
        print "NF = ", NF
        for (i = 1; i <= NF; i++) {
            printf("$%d = <%s>\n", i, $i)
        }
    }
$ gawk -f awk.awk schools.csv

Desire output would just be

108 BLVD, SE or "108 BLVD, SE"

@vgersh99 my file is 600,000 lines and it print every line but I just want the third column where the string is over 10 characters — , Mar 10 '21 at 15:07
@Shultz, so instead of the ```for``` loop, do ```if (length($3) >10) print $3```. Is that what you're after? — vgersh99, Mar 10 '21 at 15:11
@WilliamPursell what do you mean? aws FS = " 'length($3) > = 10 {print $3} ' ? — , Mar 10 '21 at 15:18

score 0 · Accepted Answer · answered Mar 10 '21 at 15:17

Well, as you are already using GNU awk, you could utilize gensub to remove leading and trailing double quotes for length:

$ gawk 'BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")" 
}
length(gensub(/^\"|\"$/,"","g",$3))>=10 {
    print $3
}' file

Output:

"103 avenue"
"108 BLVD, SE"
"MLK drive 1004"

If you want the output without the double quotes as well:

{
    gsub(/^"|"$/,"",$3)
    if(length($3)>=10)
        print $3
}

How to use gawk to print out the 3rd column that is greater than 10 characters regardless of comma in the string

1 Answers1