0

I have a script that extracts a file from a bash script combined with a binary file. It does so using the following GNU sed syntax sed -n '/__DATA__/{n;:1;n;p;b1}' /tmp/combined.file > /tmp/binary.file

The files are assembled by cat'ing an ISO file to the end of a bash script. Which is then sent over the network to an embedded device and extracted on the device, piping the ISO file to a temporary dir and executing the bash script to install it.

However, on executing this I get a sed: unterminated {

Am I missing something here? Is this task possible with BusyBox sed?

Alex Turner
  • 470
  • 1
  • 4
  • 14
  • 1
    Unfortunately, once you get past `s/old/new/` with sed chances are you're using non-portable language constructs. [edit] your question to show a [mcve] with concise, testable sample input and expected output if you'd like some help to do whatever it is you're trying to do portably such that it'll work on all UNIX boxes. And you might want to add an `awk` tag to your question as it'll probably be easier to do that in awk than in sed. – Ed Morton May 11 '20 at 13:27

1 Answers1

2

It tried the "Second attempt" below with OSX/BSD awk and it failed, just printing up til the first NUL character. So you can't do this job portably with awk or sed.

Here's what should work everywhere given that the POSIX standard says

the input file to tail can be any type

so the input to tail doesn't have to be a POSIX text file (no NULs) and we're exiting from awk before the first NUL is encountered in the input so they should both be happy:

$ tail -n +"$(awk '/^__DATA__$/{print NR+2; exit}' binary.bin)" binary.bin | cat -ev
ER^H^@^@^@M-^PM-^P^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@3��M-^Nռ^@|��f1�f1�fSfQ^FWM-^N�M-^N�R�^@|�^@^F�^@^A��K^F^@^@R�A��U1�0���^Sr^VM-^A�U�u^PM-^C�^At^Kf�^F�^F�B�^U�^B1�ZQ�^H�^S[^O��@PM-^C�?Q��SRP�^@|�^D^@f��^G�D^@^OM-^BM-^@^@f@M-^@�^B��fM-^A>@|��xpu   ��{�D|^@^@�M-^C^@isolinux.bin missing or corrupt.^M$
f`f1�f^C^F�{f^S^V�{fRfP^FSj^Aj^PM-^I�f�6�{��^FM-^H�M-^H�M-^R�6�{M-^H�^H�A�^A^BM-^J^V�{�^SM-^Md^Pfa��^^^@Operating system load error.^M$
^��^NM-^J>b^D�^G�^P<$
u��^X���^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@L^D^@^@^@^@^@^@�K�6^@^@M-^@^@^A^@^@?�M-^K^@^@^@^@^@`^\^@^@�������<R^@^@^@^_^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@U�EFI PART^@^@^A^@\^@^@^@]3�.^@^@^@^@^A^@^@^@^@^@^@^@�_^\^@^@^@^@^@@^@^@^@^@^@^@^@�_^\^@^@^@^@^@Uc�r^Oqc@M-^Rc^F�$LZ�^L^@^@^@^@^@^@^@�^@^@^@M-^@^@^@^@�t

Second attempt:

Now that I have a better idea what you're trying to do (process a file consisting of POSIX text lines up to a point and then can contain NUL characters afterwards), try this:

$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
foo^@bar^@etc

$ cat tst.awk
/^__DATA__$/ { n=NR + 1 }
n && (NR == n) { RS="\0"; ORS="" }
n && (NR > n)  { print (c++ ? RS : "") $0 }

$ awk -f tst.awk file | cat -ev
foo^@bar^@etc

The above doesn't try to store any input lines containing NUL in memory, instead it reads \n-terminated text lines until it reaches the line after the one containing __DATA__ and then switches to reading NUL-terminated records into memory and printing NULs between them on output.

It's still undefined behavior per POSIX (see my comments below) but in theory it should work since it just relies on being able to set one variable (RS) to NUL rather than trying to store input strings that contain NULs. Also, setting RS to NUL has been a (flawed) workaround for awk scripts for years to be able to read a whole file into memory at once so being able to set RS to NUL should work in any modern awk.


Using the new sample you provided with the missing blank line after the __DATA__ line added:

$ cat -ev file
#!/bin/bash$
$
echo "I: Awesome Things happened here"$
exit 0$
$
__DATA__$
$
ER^H^@^@^@M-^PM-^P^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@3M-mM-zM-^NM-UM-<^@|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^@|M-?^@^FM-9^@^AM-sM-%M-jK^F^@^@RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F@PM-^CM-a?QM-wM-aSRPM-;^@|M-9^D^@fM-!M-0^GM-hD^@^OM-^BM-^@^@f@M-^@M-G^BM-bM-rfM-^A>@|M-{M-@xpu    M-zM-<M-l{M-jD|^@^@M-hM-^C^@isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-@M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^@Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@L^D^@^@^@^@^@^@M-/KM-66^@^@M-^@^@^A^@^@?M-`M-^K^@^@^@^@^@`^\^@^@M-~M-^?M-^?M-oM-~M-^?M-^?<R^@^@^@^_^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@UM-*EFI PART^@^@^A^@\^@^@^@]3M-%.^@^@^@^@^A^@^@^@^@^@^@^@M-^?_^\^@^@^@^@^@@^@^@^@^@^@^@^@M-J_^\^@^@^@^@^@UcM-)r^Oqc@M-^Rc^FM-2$LZM-p^L^@^@^@^@^@^@^@M-P^@^@^@M-^@^@^@^@M-{t

.

$ awk -f tst.awk file | cat -ev
ER^H^@^@^@M-^PM-^P^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@3M-mM-zM-^NM-UM-<^@|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^@|M-?^@^FM-9^@^AM-sM-%M-jK^F^@^@RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F@PM-^CM-a?QM-wM-aSRPM-;^@|M-9^D^@fM-!M-0^GM-hD^@^OM-^BM-^@^@f@M-^@M-G^BM-bM-rfM-^A>@|M-{M-@xpu    M-zM-<M-l{M-jD|^@^@M-hM-^C^@isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-@M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^@Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@L^D^@^@^@^@^@^@M-/KM-66^@^@M-^@^@^A^@^@?M-`M-^K^@^@^@^@^@`^\^@^@M-~M-^?M-^?M-oM-~M-^?M-^?<R^@^@^@^_^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@UM-*EFI PART^@^@^A^@\^@^@^@]3M-%.^@^@^@^@^A^@^@^@^@^@^@^@M-^?_^\^@^@^@^@^@@^@^@^@^@^@^@^@M-J_^\^@^@^@^@^@UcM-)r^Oqc@M-^Rc^FM-2$LZM-p^L^@^@^@^@^@^@^@M-P^@^@^@M-^@^@^@^@M-{t

Original answer:

Assuming this question is related to your previous question, this will work using any awk in any shell on every UNIX box:

$ awk '/^__DATA__$/{n=NR+1} n && NR>n' file
3<ED>M-^PM-^PM-^PM-^PM-^

When it finds __DATA__ it sets a variable n to the line number to start printing after and then when n is set prints every line for which the line number is greater than n.

The above was run against this input file from your previous question:

$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
3<ED>M-^PM-^PM-^PM-^PM-^$
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks for your insight here Ed. This looks to be very well aligned with what I'm trying to achieve. For context, the binary file in question is an installer iso. When using the above awk command, it appears to extract the iso when I then pipe to a file, though it appears to modify the file replacing a number of 0x00 with 0x0a - not sure what's going on there - https://pastebin.com/8dFcYkcF – Alex Turner May 11 '20 at 14:06
  • `0x00` is the ASCII NUL character. Awk and sed are tools for processing text files. [By definition](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403) text files do not contain NUL characters (consider the implementation - if you store input as C-strings, those strings end with a NUL character so then how can you store a string that both ends in NUL and contains NUL?). So by trying to process a file that's not a text file with text-processing tools you're in the realm of undefined behavior so YMMV with whatever solution you try to implement using such tools. – Ed Morton May 11 '20 at 14:48
  • Having said that, I updated my answer to show a way that should be able to handle an input file like yours. – Ed Morton May 11 '20 at 15:48
  • 1
    Thanks for the update here, think that's a really clever idea with the NUL handling- looks like this is now treating the file correctly however your updated solution is only returning the first _line_ or 38 bytes - I can't seem to work out why - I've uploaded the first KB of the file here for a reproducible example: http://s3.alexturner.co/files/ubuntutmp.iso – Alex Turner May 12 '20 at 04:26
  • There's no `__DATA__` line in that file you uploaded so I'm surprised the tools outputting anything at all. I appended it to a file that did have a `__DATA__` line and got the output I expected but I only have GNU awk to test with so idk if you'd get get a different result with some other awk. I see some control-Ms in your file - are you sure those aren't just overwriting the output and making it LOOK like you only get 1 output line (see https://stackoverflow.com/q/45772525/1745001)? – Ed Morton May 12 '20 at 04:35
  • Terrible example sorry - I've combined the files together here: http://s3.alexturner.co/files/binary.bin – Alex Turner May 12 '20 at 06:33
  • There's no blank line after your `__DATA__` line in that new sample file so again it doesn't look like the file you asked for help to parse but when I add that missing blank line I get the expected output. I updated my answer to show that. – Ed Morton May 12 '20 at 12:40