3

I have a document whose lines are separated by "\t\n". Records are separated either by "\t", OR by "\n".

Normally, this should be a straigtforward awk query:

BEGIN {
   RS='\t\n';
}
{
   print;
   print "Next entry:";
}

However, on a Mac, regular expressions do not seem to be supported (maybe I'm not doing something right?) So I tried, RS="\t\n"; however, this is interpreted as RS='\t | \n'. Similar problems running awk from the command line:

awk 1 RS='\t\n' ORS='abc' input > output

replaces the \t's, but leaves the \n's be.

Next try: using tr. This obviously fails for sequence of more than one character-- since \t and \n are both used individually in the rows.

Next:

sed -e '/\t\n/s//NextEntry:/g' input > output

However, doesn't work. Entering any ASCII character sequence instead of \t\n works.

Read the manual. It says that \t is not supported in sed strings. Fair enough

sed -e '/\x9\xa/s//abc/' input > output

Still doesn't work. Idea: use tr to replace \t and \n by characters unused in the input file, use sed to change them to what I want, and then tr to change the remaining characters back to what they should be.

tr: Illegal byte sequence

Turns out, that f6 character makes tr just totally fail.

Went through the suggestions in Sed not recognizing \t instead it is treating it as 't' why? . That might work for replacing output strings (except the "Pasting tab into command prompt via CTRL+V" suggestion-- the shell just rejected that paste.), but did not seem to help in my case.

Maybe it's because it's a Mac? Maybe it's because that's the text I'm looking for, not replacing with? Maybe it's the combination with \n?

Any other suggestions?

UPDATE:

I found thread How can I replace a newline (\n) using sed? . Apparently, I am unable even to replace a \n by the string "abc" using the suggestions in that thread.

EDIT: Hex head of source file:

5a 20 4e 4f 09 0a 41 53  20 4f 46 20 30 31 2d 30
34 2d 30 35 20 45 4d 50  4c 4f 59 45 45 0a 47 52  
4f 55 50 09 48 49 52 45  20 44 41 54 45 09 53 41 
4c 41 52 59 09 4a 4f 42  20 54 49 54 4c 45 09 0a  
4a 4f 42 20 4c 45 56 45  4c 0a 53 45 52 49 45 53  
09 41 50 50 54 20 54 59  50 45 09 0a 50 41 59 20  
53 54 41 54 55 53 0a f6
Alex
  • 947
  • 1
  • 8
  • 25

1 Answers1

3

Unfortunately, BSD awk, as also used on macOS, doesn't support multi-character record separators (RS) altogether (in line with POSIX) - only a single, literal character is supported.

BSD sed, as also used on macOS, supports only \n in regexes - any other escapes, including hex ones (e.g., \x09) are not supported.
See this answer of mine for a comprehensive comparison of GNU and BSD sed.

Assuming that your sed command works in principle, you can use an ANSI C-quoted string ($'\t') to splice a literal tab char. into your sed script (assumes bash (the macOS default shell), ksh, or zsh),:

sed -e ':a' -e '$!{N;ba' -e '}' -e '/'$'\t''\n/s//NextEntry:/g'

Note that, in order to replace newlines, you must instruct sed to read the entire file into memory first, which is what -e ':a' -e '$!{N;ba' -e '}' does (the BSD Sed-compatible form of the common GNU sed idiom :a;$!{N;ba}).

mklement0
  • 382,024
  • 64
  • 607
  • 775