How to use awk to extract a quoted field?

Question

I am using

awk '{ printf "%s", $3 }'

to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?

show your input file format..and your desired output! – ghostdog74 Aug 11 '10 at 15:03 — ghostdog74, Aug 11 '10 at 15:03

score 5 · Answer 1 · answered Aug 11 '10 at 15:06

5

show your input file and desired output next time. To get quoted fields,

$ cat file
field1 field2 "field 3" field4 "field5"

$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5

answered Aug 11 '10 at 15:06

ghostdog74

327,991
56
259
343

Actually it is the apache web server log. It seems that awk can't do it easily. – mmonem Aug 11 '10 at 18:09
2

@mmonem Then this might be useful: http://serverfault.com/questions/11028/do-you-have-any-useful-awk-and-grep-scripts-for-parsing-apache-logs – schot Aug 12 '10 at 11:15

benj · Answer 2 · 2014-09-02T15:16:07.417

Here's a possible alternative solution to this problem. It works by finding the fields that begin or end with quotes, and then joining those together. At the end it updates the fields and NF, so if you put more patterns after the one that does the merging, you can process the (new) fields using all the normal awk features.

I think this uses only features of POSIX awk and doesn't rely on gawk extensions, but I'm not completely sure.

# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
    #printf "Merge fields $%d to $%d\n", start, stop;
    if (start >= stop)
        return;
    merged = "";
    for (i = start; i <= stop; i++) {
        if (merged)
            merged = merged OFS $i;
        else
            merged = $i;
    }
    $start = merged;

    offs = stop - start;
    for (i = start + 1; i <= NF; i++) {
        #printf "$%d = $%d\n", i, i+offs;
        $i = $(i + offs);
    }
    NF -= offs;
}

# Merge quoted fields together.
{
    start = stop = 0;
    for (i = 1; i <= NF; i++) {
        if (match($i, /^"/))
            start = i;
        if (match($i, /"$/))
            stop = i;
        if (start && stop && stop > start) {
            merge_fields(start, stop);
            # Start again from the beginning.
            i = 0;
            start = stop = 0;
        }
    }
}

# This rule executes after the one above. It sees the fields after merging.
{
    for (i = 1; i <= NF; i++) {
        printf "Field %d: >>>%s<<<\n", i, $i;
    }
}

On an input file like:

thing "more things" "thing" "more things and stuff"

it produces:

Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<

score 1 · Answer 3 · answered Jun 14 '15 at 08:37

1

If you are just looking for a specific field then

$ cat file
field1 field2 "field 3" field4 "field5"

awk -F"\"" '{print $2}' file

works. It splits the file by ", so the 2nd field in the example above is the one you want.

answered Jun 14 '15 at 08:37

Alan Swindells

299
3
6

score 1 · Accepted Answer · answered Aug 11 '10 at 14:31

This is actually quite difficult. I came up with the following awk script that splits the line manually and stores all fields in an array.

{
    s = $0
    i = 0
    split("", a)
    while ((m = match(s, /"[^"]*"/)) > 0) {
        # Add all unquoted fields before this field
        n = split(substr(s, 1, m - 1), t)
        for (j = 1; j <= n; j++)
            a[++i] = t[j]
        # Add this quoted field
        a[++i] = substr(s, RSTART + 1, RLENGTH - 2)
        s = substr(s, RSTART + RLENGTH)
        if (i >= 3) # We can stop once we have field 3
            break
    }
    # Process the remaining unquoted fields after the last quoted field
    n = split(s, t)
    for (j = 1; j <= n; j++)
        a[++i] = t[j]
    print a[3]
}

It is quite complex solution. If there is no simple *one line* solution, I'd go for perl — mmonem, Aug 11 '10 at 18:10

How to use awk to extract a quoted field?

4 Answers4

Linked