I am using
awk '{ printf "%s", $3 }'
to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?
I am using
awk '{ printf "%s", $3 }'
to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?
show your input file and desired output next time. To get quoted fields,
$ cat file
field1 field2 "field 3" field4 "field5"
$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5
Here's a possible alternative solution to this problem. It works by finding the fields that begin or end with quotes, and then joining those together. At the end it updates the fields and NF, so if you put more patterns after the one that does the merging, you can process the (new) fields using all the normal awk features.
I think this uses only features of POSIX awk and doesn't rely on gawk extensions, but I'm not completely sure.
# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
#printf "Merge fields $%d to $%d\n", start, stop;
if (start >= stop)
return;
merged = "";
for (i = start; i <= stop; i++) {
if (merged)
merged = merged OFS $i;
else
merged = $i;
}
$start = merged;
offs = stop - start;
for (i = start + 1; i <= NF; i++) {
#printf "$%d = $%d\n", i, i+offs;
$i = $(i + offs);
}
NF -= offs;
}
# Merge quoted fields together.
{
start = stop = 0;
for (i = 1; i <= NF; i++) {
if (match($i, /^"/))
start = i;
if (match($i, /"$/))
stop = i;
if (start && stop && stop > start) {
merge_fields(start, stop);
# Start again from the beginning.
i = 0;
start = stop = 0;
}
}
}
# This rule executes after the one above. It sees the fields after merging.
{
for (i = 1; i <= NF; i++) {
printf "Field %d: >>>%s<<<\n", i, $i;
}
}
On an input file like:
thing "more things" "thing" "more things and stuff"
it produces:
Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<
If you are just looking for a specific field then
$ cat file
field1 field2 "field 3" field4 "field5"
awk -F"\"" '{print $2}' file
works. It splits the file by ", so the 2nd field in the example above is the one you want.
This is actually quite difficult. I came up with the following awk
script that splits the line manually and stores all fields in an array.
{
s = $0
i = 0
split("", a)
while ((m = match(s, /"[^"]*"/)) > 0) {
# Add all unquoted fields before this field
n = split(substr(s, 1, m - 1), t)
for (j = 1; j <= n; j++)
a[++i] = t[j]
# Add this quoted field
a[++i] = substr(s, RSTART + 1, RLENGTH - 2)
s = substr(s, RSTART + RLENGTH)
if (i >= 3) # We can stop once we have field 3
break
}
# Process the remaining unquoted fields after the last quoted field
n = split(s, t)
for (j = 1; j <= n; j++)
a[++i] = t[j]
print a[3]
}