5

I have to run a third-party program in background and capture its output to file. I'm doing this simply using the_program > output.txt. However, the coders of said program decided to be flashy and show processed lines in real-time, using \b characters to erase the previous value. So, one of the lines in output.txt ends up like Lines: 1(b)2(b)3(b)4(b)5, (b) being an unprintable character with ASCII code 08. I want that line to end up as Lines: 5.

I'm aware that I can write it as-is and post-process the file using AWK, but I wonder if it's possible to somehow process the control characters in-place, by using some kind of shell option or by piping some commands together, so that line would become Lines: 5 without having to run any additional commands after the program is done?

Edit:

Just a clarification: what I wrote here is a simplified version, actual line count processed by the program is a hundred thousands, so that string ends up quite long.

Community
  • 1
  • 1
Timekiller
  • 2,946
  • 2
  • 16
  • 16
  • 3
    piping your output to a filter to remove the control characters *is* post-processing. – William Pursell Dec 15 '15 at 06:20
  • 1
    [`man colcrt`](http://www.freebsd.org/cgi/man.cgi?query=colcrt&sektion=1&apropos=0&manpath=redhat) – tripleee Dec 15 '15 at 07:28
  • @WilliamPursell Right, I didn't think of that. I basically want a one-liner, so I guess I can try to pipe the value through the same AWK script. Though with piping through the script, the output ends up in the file only when the program end its execution, it'd be a big plus if lines appeared in the resulting file as soon as they are outputted by the program. I'm trying my luck with `xargs`, but can't get it to work the way I want yet. – Timekiller Dec 15 '15 at 07:33
  • @tripleee it kind of works - I no longer get long strings full of `\b`, but now I get garbage in the end instead of expected string. I guess it doesn't interpret `\b ` as "delete character and print space instead". – Timekiller Dec 15 '15 at 08:54
  • *"the output ends up in the file only when the program end"* -- this is not true; it is output when the buffer is flushed. See http://mywiki.wooledge.org/BashFAQ/009 – tripleee Dec 15 '15 at 08:56
  • Backspace is not delete, it is "move cursor back on top". If you want to delete the character under the cursor, that's DEL (ASCII code 127). – tripleee Dec 15 '15 at 08:57
  • colcrt isn't a suitable solution; you can filter the result with several different programs or scripts - see for example [Can I programmatically “burn in” ANSI control codes to a file using unix utils?](http://stackoverflow.com/questions/28269278/can-i-programmatically-burn-in-ansi-control-codes-to-a-file-using-unix-utils/28334291#28334291). – Thomas Dickey Dec 15 '15 at 09:10

2 Answers2

2

Thanks for your comments! I ended up piping the output of that program to AWK Script I linked in the question. I get a well-formed file in the end.

the_program | ./awk_crush.sh > output.txt

The only downside is that I get the output only once the program itself is finished, even though the initial output exceeds 5M and should be passed in the lesser chunks. I don't know the exact reason, perhaps AWK script waits for EOF on stdin. Either way, on more modern system I would use

stdbuf -oL the_program | ./awk_crush.sh > output.txt

to process the output line-by-line. I'm stuck on RHEL4 with expired support though, so I'm unable to use neither stdbuf nor unbuffer. I'll leave it as-is, it's fine too.

The contents of awk_crush.sh are based on this answer, except with ^H sequences (which are supposed to be ASCII 08 characters entered via VIM commands) replaced with escape sequence \b:

#!/usr/bin/awk -f
function crushify(data) {
  while (data ~ /[^\b]\b/) {
      gsub(/[^\b]\b/, "", data) 
  }                                                     
  print data
}

crushify($0)

Basically, it replaces character before \b and \b itself with empty string, and repeats it while there are \b in the string - just what I needed. It doesn't care for other escape sequences though, but if it's necessary, there's a more complete SED solution by Thomas Dickey.

Community
  • 1
  • 1
Timekiller
  • 2,946
  • 2
  • 16
  • 16
2

Pipe it to col -b, from util-linux:

the_program | col -b

Or, if the input is a file, not a program:

col -b < input > output

Mentioned in Unix & Linux: Evaluate large file with ^H and ^M characters.

Quasímodo
  • 3,812
  • 14
  • 25