I have a data file for fortune that contains many repeated fortunes. I would like to remove them.
Fortunes are delineated by %
's, so a sample fortune file may look like this:
%
This is sample fortune 1
%
This is
sample fortune 2
%
This fortune
is repeated
%
This is sample fortune 3
%
This fortune
is repeated
%
This fortune
is unique
%
As you can see, fortunes can span across multiple lines, rendering the solutions here useless.
What can I do to find and remove the repeated fortunes? I thought about just finding a way to make awk
ignore lines beginning with %
, but some fortunes share identical lines but are not the same overall (such as the last two in my example), so that is not enough.
I've been trying to solve this with awk
so far, but any tool is fine.