1

The application here is "sanitizing" strings for inclusion in a log file. For the sake of argument, let's assume that 1) colorizing the string at runtime is proper; and 2) I need leading and trailing spaces on screen but excess whitespace removed from the log.

The specific application here is tee-ing into a log file. Not all lines would be colorized, and not all lines would have leading/trailing spaces.

Given this, I want to

  1. Remove all codes both setting the color and resetting. The reason for this will be apparent in a moment
  2. Remove leading and trailing whitespace

When you search (anywhere) for how to strip color codes in bash, you can find many different ways to accomplish it. What I have discovered so far however is nobody seems to address the trailing reset; the $(tput sgr0). In the examples I have seen this is inconsequential, however my additional requirement to strip leading/trailing spaces complicates it/makes it a requirement.

Here is my example script which demonstrates the issue:

#!/bin/bash

# Create a string with color, leading spaces, trailing spaces, and a reset
REPLY="$(tput setaf 2)       This is green        $(tput sgr0)"
echo "Colored output:  $REPLY"
# Remove initial color code
REPLY="$(echo "$REPLY" | sed 's,\x1B\[[0-9;]*[a-zA-Z],,g')"
echo "De-colorized output:  $REPLY"
# Remove leading and trailing spaces if present
REPLY="$(printf "%s" "${REPLY#"${REPLY%%[![:space:]]*}"}" | sed -n -e 'l')"
echo "Leading spaces removed:  $REPLY"
REPLY="$(printf "%s" "${REPLY%"${REPLY##*[![:space:]]}"}" | sed -n -e 'l')"
echo "Trailing spaces removed:  $REPLY"

The output is (can't figure out how to color text here, assume the first line is green, subsequent lines are not):

screen cap

I am willing to see the error of my ways, but after about three hours trying different things, I'm pretty sure my google-fu is failing me.

Thanks for any assistance.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
LCB
  • 91
  • 1
  • 11

2 Answers2

1

I am willing to see the error of my ways, …

The primary error is just that the sed command removes only the Esc[… control sequences, but not the Esc(B sequence which is also part of sgr0. It works if you change it to

… | sed 's,\x1B[[(][0-9;]*[a-zA-Z],,g'

The secondary error is that the sed -n -e 'l' command adds a literal $ sign at the end of the line, hence the former trailing spaces aren't trailing anymore and therefore not removed.

Armali
  • 18,255
  • 14
  • 57
  • 171
  • 1
    Thanks so much! When I look at more complex sed lines, I think sometimes my eyes just go crossed. – LCB Mar 04 '19 at 16:05
0

This works for me:

$ REPLY="$(tput setaf 2)       This is green        $(tput sgr0)"
$ echo -n $REPLY | od -vAn -tcx1
 033   [   3   2   m                               T   h   i   s
  1b  5b  33  32  6d  20  20  20  20  20  20  20  54  68  69  73
       i   s       g   r   e   e   n                            
  20  69  73  20  67  72  65  65  6e  20  20  20  20  20  20  20
     033   [   m 017
  20  1b  5b  6d  0f
$ REPLY=$(echo $REPLY | sed -r 's,\x1B[\[\(][0-9;]*[a-zA-Z]\s*(.*)\x1B[\[\(].*,\1,g' | sed 's/\s*$//')
$ echo -n $REPLY | od -vAn -tcx1
   T   h   i   s       i   s       g   r   e   e   n
  54  68  69  73  20  69  73  20  67  72  65  65  6e

Apparently sed does not support non-greedy regex, which would have eliminated the second regex.

EDIT: This one should work for the input you have:

$ REPLY="$(tput setaf 2)       This is green        "$'\x1B'"(B$(tput sgr0)"
$ echo -n $REPLY | od -vAn -tcx1
 033   [   3   2   m                               T   h   i   s
  1b  5b  33  32  6d  20  20  20  20  20  20  20  54  68  69  73
       i   s       g   r   e   e   n                            
  20  69  73  20  67  72  65  65  6e  20  20  20  20  20  20  20
     033   (   B 033   [   m 017
  20  1b  28  42  1b  5b  6d  0f
$ REPLY=$(echo "$REPLY" | sed -r -e 's,\x1B[\[\(][0-9;]*[a-zA-Z]\s*([^\x1B]+)\s+\x1B.*,\1,g' -e 's,\s*$,,')
$ echo -n $REPLY | od -vAn -tcx1
   T   h   i   s       i   s       g   r   e   e   n
  54  68  69  73  20  69  73  20  67  72  65  65  6e

I find sed to be much less cryptic (or as less cryptic as regular expressions can be) as compared to bash substitutions. But that's just me :)

sdht0
  • 500
  • 4
  • 8
  • 1
    Same issue as before, the second sed does not appear to do anything. Once you apply the sed line to remove leading/trailing spaces you again see the codes that were not removed. – LCB Mar 03 '19 at 11:29
  • It is working for me. You can check with `| od -vAn -tcx1`. For me, the output from the above command just has `This is green`. – sdht0 Mar 04 '19 at 01:54
  • If you use it in the context of the script provided, it does not. @Armali had a solution that worked for me. – LCB Mar 04 '19 at 16:06
  • @LCB: I updated the answer, but it looks like your tput output is different. I added the bracket too. – sdht0 Mar 05 '19 at 21:10
  • I appreciate you coming back and trying to correct your answer. I'm not sure why we get different results but we do. I don't know how to show it here so I've put it in pastebin. I hope that's ok: https://pastebin.com/cPcJi6CU – LCB Mar 06 '19 at 12:52
  • @LCB: Thanks! I edited my answer again which should work your input. – sdht0 Mar 10 '19 at 17:12
  • Thank you Siddhartha. As soon as I get back home (on a trip) I’ll check that out. My phone is my only technology currently. – LCB Mar 10 '19 at 22:35
  • Good to know. It is quite specific to the tput output though and is probably not portable without minor changes. – sdht0 Mar 11 '19 at 16:17