I have a document whose lines are separated by "\t\n". Records are separated either by "\t", OR by "\n".
Normally, this should be a straigtforward awk query:
BEGIN {
RS='\t\n';
}
{
print;
print "Next entry:";
}
However, on a Mac, regular expressions do not seem to be supported (maybe I'm not doing something right?) So I tried, RS="\t\n"
; however, this is interpreted as RS='\t | \n'
. Similar problems running awk from the command line:
awk 1 RS='\t\n' ORS='abc' input > output
replaces the \t
's, but leaves the \n
's be.
Next try: using tr
. This obviously fails for sequence of more than one character-- since \t
and \n
are both used individually in the rows.
Next:
sed -e '/\t\n/s//NextEntry:/g' input > output
However, doesn't work. Entering any ASCII character sequence instead of \t\n works.
Read the manual. It says that \t
is not supported in sed strings. Fair enough
sed -e '/\x9\xa/s//abc/' input > output
Still doesn't work. Idea: use tr
to replace \t
and \n
by characters unused in the input file, use sed
to change them to what I want, and then tr
to change the remaining characters back to what they should be.
tr: Illegal byte sequence
Turns out, that f6
character makes tr
just totally fail.
Went through the suggestions in Sed not recognizing \t instead it is treating it as 't' why? . That might work for replacing output strings (except the "Pasting tab into command prompt via CTRL+V" suggestion-- the shell just rejected that paste.), but did not seem to help in my case.
Maybe it's because it's a Mac? Maybe it's because that's the text I'm looking for, not replacing with? Maybe it's the combination with \n
?
Any other suggestions?
UPDATE:
I found thread How can I replace a newline (\n) using sed? . Apparently, I am unable even to replace a \n
by the string "abc" using the suggestions in that thread.
EDIT: Hex head of source file:
5a 20 4e 4f 09 0a 41 53 20 4f 46 20 30 31 2d 30
34 2d 30 35 20 45 4d 50 4c 4f 59 45 45 0a 47 52
4f 55 50 09 48 49 52 45 20 44 41 54 45 09 53 41
4c 41 52 59 09 4a 4f 42 20 54 49 54 4c 45 09 0a
4a 4f 42 20 4c 45 56 45 4c 0a 53 45 52 49 45 53
09 41 50 50 54 20 54 59 50 45 09 0a 50 41 59 20
53 54 41 54 55 53 0a f6