1

I have a csv file where some of the addresses have a comma in the middle, because of this I can't use

$ awk -F',' 'length($3) >= 10 {print $3}' schools.csv

an example of my data looks like this

id,name,address
"1","paul","103 avenue"
"2","shawn","108 BLVD, SE"
"3","ryan","MLK drive 1004"

as you can see the address for id two has a comma in between so I have to use gawk module 4. So far I've been able to print every row regardless if there is a comma or not but I only want to print the 3rd column(address) that has a field > 10 characters. Here is what I have thus far.

//awk.awk file
    BEGIN {
        FPAT = "([^,]+)|(\"[^\"]+\")"
    }
    
    {
        print "NF = ", NF
        for (i = 1; i <= NF; i++) {
            printf("$%d = <%s>\n", i, $i)
        }
    }
$ gawk -f awk.awk schools.csv

Desire output would just be

108 BLVD, SE or "108 BLVD, SE"

1 Answers1

0

Well, as you are already using GNU awk, you could utilize gensub to remove leading and trailing double quotes for length:

$ gawk 'BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")" 
}
length(gensub(/^\"|\"$/,"","g",$3))>=10 {
    print $3
}' file

Output:

"103 avenue"
"108 BLVD, SE"
"MLK drive 1004"

If you want the output without the double quotes as well:

{
    gsub(/^"|"$/,"",$3)
    if(length($3)>=10)
        print $3
}
James Brown
  • 36,089
  • 7
  • 43
  • 59