Here is a non-numeric way in Awk. This works if we have an Awk that supports the RS
variable being more than one character long. We break the data into records based on the blank line separation: "\n\n"
. Inside these records, we break fields on newlines. Thus $1
is the word, $2
is the definition, $3
is the quote and $4
is the source:
awk 'BEGIN {OFS=FS="\n";ORS=RS="\n\n"} $1=$1" >>"'
We use the same output separators as input separators. Our only pattern/action step is then to edit $1
so that it has >>
on it. The default action is { print }
, which is what we want: print each record. So we can omit it.
Shorter: Initialize RS
from catenation of FS
.
awk 'BEGIN {OFS=FS="\n";ORS=RS=FS FS} $1=$1" >>"'
This is nicely expressive: it says that the format uses two consecutive field separators to separate records.
What if we use a flag, initially reset, which is reset on every blank line? This solution still doesn't depend on a hard-coded number, just the blank line separation. The rule fires on the first line, because C
evaluates to zero, and then after every blank line, because we reset C
to zero:
awk 'C++?1:$0=$0" >>";!NF{C=0}'
Shorter version of accepted Awk solution:
awk '(NR-1)%5?1:$0=$0" >>"'
We can use a ternary conditional expression cond ? then : else
as a pattern, leaving the action empty so that it defaults to {print}
which of course means {print $0}
. If the zero-based record number is is not congruent to 0, modulo 5, then we produce 1
to trigger the print action. Otherwise we evaluate `$0=$0" >>" to add the required suffix to the record. The result of this expression is also a Boolean true, which triggers the print action.
Shave off one more character: we don't have to subtract 1 from NR
and then test for congruence to zero. Basically whenever the 1-based record number is congruent to 1, modulo 5, then we want to add the >>
suffix:
awk 'NR%5==1?$0=$0" >>":1'
Though we have to add ==1
(+3 chars), we win because we can drop two parentheses and -1
(-4 chars).
We can do better (with some assumptions): Instead of editing $0
, what we can do is create a second field which contains >>
by assigning to the parameter $2
. The implicit print
action will print this, offset by a space:
awk 'NR%5==1?$2=">>":1'
But this only works when the definition line contains one word. If any of the words in this dictionary are compound nouns (separated by space, not hyphenated), this fails. If we try to repair this flaw, we are sadly brought back to the same length:
awk 'NR%5==1?$++NF=">>":1'
Slight variation on the approach: Instead of trying to tack >>
onto the record or last field, why don't we conditionally install >>\n
as ORS
, the output record separator?
awk 'ORS=(NR%5==1?" >>\n":"\n")'
Not the tersest, but worth mentioning. It shows how we can dynamically play with some of these variables from record to record.
Different way for testing NR == 1 (mod 5): namely, regexp!
awk 'NR~/[16]$/?$0=$0" >>":1'
Again, not tersest, but seems worth mentioning. We can treat NR
as a string representing the integer as decimal digits. If it ends with 1
or 6
then it is congruent to 1, mod 5. Obviously, not easy to modify to other moduli, not to mention computationally disgusting.