89

We want to remove ^[, and all of the escape sequences.

sed is not working and is giving us this error:

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
hasan
  • 891
  • 1
  • 7
  • 3
  • 3
    Are you looking for two characters, caret `^` and open square bracket `[`, or are you looking for one character, control-[ (ASCII ESCAPE, 0x1B)? Are you looking to remove the terminal control sequences that follow the ESC character? If so, that is a complex job, and ultimately requires you to know which terminal the control codes were generated for - different terminal types use different control sequences, and for a single terminal type, different commands have different numbers of following characters. – Jonathan Leffler Jun 30 '11 at 14:30
  • This is not _such_ a difficult task - it depends in part on the context. – Graham Nicholls Sep 27 '18 at 08:39

14 Answers14

69

Are you looking for ansifilter?


Two things you can do: enter the literal escape (in bash:)

Using keyboard entry:

sed 's/Ctrl-vEsc//g'

alternatively

sed 's/Ctrl-vCtrl-[//g'

Or you can use character escapes:

sed 's/\x1b//g'

or for all control characters:

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!
The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
sehe
  • 374,641
  • 47
  • 450
  • 633
  • also asked: https://stackoverflow.com/questions/17998978/removing-colors-from-output See ansifilter source: https://gitlab.com/saalen/ansifilter/ – mike Oct 22 '22 at 23:06
59

commandlinefu gives the correct answer which strips ANSI colours as well as movement commands:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
Tom Hale
  • 40,825
  • 36
  • 187
  • 242
  • 3
    This works with gnu sed, but is not portable to other sed implementations (e.g., bsd) - because of the \x1B. For other seds, you can use the raw escape character (you can use the ctrl-v prefix to insert a literal escape character on the command line). – Juan Mar 06 '18 at 16:24
  • 11
    Bash also lets you say `sed $'s,\x1B\[[0-9;]*[a-zA-Z],,g'` where the dollar sign before the single quote is significant (it produces a "C-style" string). – tripleee Feb 07 '20 at 09:39
  • 1
    @tripleee - thanks! this helped for macos(bsd) sed. Extending your example a bit further: sed $'s,[\x01-\x1F\x7F]\[[0-9;]*[a-zA-Z],,g" handles all escape sequences – Goblinhack May 18 '20 at 20:20
  • I found this sed substitution left `^[(B` in /var/log/dnf.log output. The one by AGipson worked better for me. – skierpage Jun 13 '21 at 02:28
  • `\x1B\[[?0-9;]*[a-zA-Z]` with `IGNORECASE` did the trick. As escape codes can be `\x1B[?25` for instance, the `?` being notable here. – Torxed Jan 30 '22 at 21:26
22

I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:

sed -r s/\x1b\[[0-9;]*m?//g

This removes m commands, but for all escapes (as commented by @lethalman) use:

sed -r s/\x1b\[[^@-~]*[@-~]//g

Also see "https://stackoverflow.com/questions/7857352/python-regex-to-match-vt100-escape-sequences".

There is also a table of common escape sequences.

Thaodan
  • 107
  • 1
  • 3
  • 10
Luke H
  • 3,125
  • 27
  • 31
  • 1
    That only escapes the `m` command. This should be more generic `\x1b\[[^@-~]*[@-~]` – lethalman Sep 02 '15 at 16:13
  • I specifically mentioned that it isn't generic— "...but this doesn't include all possible ANSI escapes..." – Luke H Sep 03 '15 at 08:30
  • 1
    The `[^@-~]*[@-~]` didn't work for me; I needed `[^A-Za-z]*[A-Za-z]` (which seems to match all the required characters in the table) – David Fraser Aug 16 '17 at 10:19
  • 2
    Note that on BSD (Mac OS X) sed doesn’t support ANSI-C escape sequences like `\x1b`. So in these environments one might need to lean on the shell a bit by having it expand the escape byte: `sed 's/'"$(printf '\x1b')"'\[[^@-~]*[@-~]//g'` — Tested on both BSD and GNU sed in bash4, seems to work fine. – Mark G. May 08 '18 at 20:08
  • In my situation of trying to delete the escape sequences in `unbuffer yum search`, I had to do `sed 's/\x1b\(\[\|(\)[^A-Za-z]*[A-Za-z]//g'` (I couldn't get the syntax to work with `-r`). In addition to the adjustment by @DavidFraser, I not only had to remove stuff that started with `\x1b\[` but also stuff that started with `\x1b(`. – Levi Uzodike Sep 02 '21 at 19:12
18

ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.

soorajmr
  • 520
  • 4
  • 9
  • 1
    So `ansi2txt` appears to not to strip bold characters, whereas the answer using `col -b` listed below (perversely) does. Here is a test case to demonstrate this: `diff <(man -Tutf8 tmux | col -b | head | hd) <(man -Tutf8 tmux | ansi2txt | head | hd) ` – Att Righ Mar 08 '17 at 15:27
  • 3
    It looks like piping `ansi2txt` to `col -b` is necessary to remove everything. – Marius Gedminas Jun 27 '19 at 12:43
  • When piping `ansi2txt` to `col -b` you may want to use `col -xb` to prevent spaces getting replaced by tabs. But then tabs will be replaced with spaces. – Daniel F Dec 30 '21 at 07:20
12

I don't have enough reputation to add a comment to the answer given by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.

sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'
AGipson
  • 161
  • 1
  • 5
11

I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS).

For that task the correct command would be col -bx, for example:

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

(source)

Why this works: (in response to a comment by @AttRigh)

groff produces bold characters like you would on a typewriter: print a letter, move one character back with backspace (you can't erase text on a typewriter), print the same letter again to make the character more pronounced. So simply omitting backspaces produces "SSYYNNOOPPSSIISS". col -b fixes this by interpreting backspaces correctly, quote from the manual:

-b Do not output any backspaces, printing only the last character written to each column position.

gronostaj
  • 2,231
  • 2
  • 23
  • 43
  • 1
    It seems to be the `col -b` option that does this. The documentation says that this removes backspaces characters :/, go figure. It is nevertheless the most compact option that I could find that doesn't require one to install things (outside of one's package manager) – Att Righ Mar 08 '17 at 15:31
  • 2
    `i++` for this. Don't reinvent this wheel, folks. See also `colcrt` – tripleee Jun 19 '18 at 05:48
10

You can remove all non printable characters with this:

sed 's/[^[:print:]]//g'

pyjama
  • 109
  • 1
  • 2
  • 1
    On Mac, using sed, this is the only answer that worked to remove the `\x1b` ascii escape characters. – Davos Aug 15 '19 at 02:37
  • 3
    But this only removes the invisible characters; so something like `^[[0;31m` will simply turn into `[0;31m`. – tripleee Feb 07 '20 at 09:38
  • @tripleee you can add `....` and remove them. `'s/[^[:print:]]....//g'` – rth Mar 21 '20 at 19:06
  • 3
    @rth Unclear what you are proposing; trimming exactly four characters is wrong because the escape sequences are different lengths. You'd have to write an escape sequence parser to know how many to remove. – tripleee Mar 23 '20 at 05:20
  • @tripleee Ah yes, agree. But I was just looking for a simple solution to remove color sequences from an output, and for this primitive purpose, it works. Of course, it doesn't work for more general cases. Thank you for catching this. – rth Mar 23 '20 at 19:26
  • 1
    Even then, you can't know whether it's a single-digit or double-digit number, as evidenced by the earlier example. You could wing it with something like `s/[^[:print;]]\[[0-9;]*[A-Za-z]//g` but I'm not sure that's entirely correct either. – tripleee Mar 24 '20 at 04:43
  • If all you want to do is remove non-printables, you can also use `tr -dc '[:print:]'`. – Steen Schütt May 27 '21 at 08:45
  • So, I had some success on MAC with `sed 's/[^[:print:]]\[[0-9;]*[a-zA-Z]//g' old.txt > new.txt` – Yasin Zähringer Aug 20 '21 at 16:16
6

I built vtclean for this. It strips escape sequences using these regular expressions in order (explained in regex.txt):

// handles long-form RGB codes
^\033](\d+);([^\033]+)\033\\

// excludes non-movement/color codes
^\033(\[[^a-zA-Z0-9@\?]+|[\(\)]).

// parses movement and color codes
^\033([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.

lunixbochs
  • 21,757
  • 2
  • 39
  • 47
3

Just a note; let's say you have a file like this (such line endings are generated by git remote reports):

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

In binary, this looks like this:

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

It is visible that git here adds the sequence 0x1b 0x5b 0x4b before the line ending (0x0a).

Note that - while you can match the 0x1b with a literal format \x1b in sed, you CANNOT do the same for 0x5b, which represents the left square bracket [:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

You might think you can escape the representation with an extra backslash \ - which ends up as \\x5b; but while that "passes" - it doesn't match anything as intended:

$ cat chartest.txt | sed 's/\x1b\\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

So if you want to match this character, apparently you must write it as escaped left square bracket, that is \[ - the rest of the values can than be entered with escaped \x notation:

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a
sdaau
  • 36,975
  • 46
  • 198
  • 278
3

sed based approach without extended regular expressions enabled by -r

sed 's/\x1B\[[0-9;]*[JKmsu]//g'
palik
  • 2,425
  • 23
  • 31
  • 1
    That one is filtering nicely complex escape codes llike:`\033[38;2;255;255;255m ` where even iconv `iconv -f "ASCII" -t "UTF-8"` is failing. Thanks for posting – NVRM Nov 26 '19 at 06:20
  • 1
    first 4 solution didn't work, but this one does! – Bogdan Mart Oct 13 '21 at 08:53
2

Tom Hale's answer left unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:

sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
    -e "s/^[[[][0-9][0-9]*[@]//" \
    -e "s/^[[=0-9]<[^>]*>//" \
    -e "s/^[[)][0-9]//" \
    -e "s/.^H//g" \
    -e "s/^M//g" \
    -e "s/^^H//" \
        file.dirty > file.clean

As this was done on a non-GNU version of sed, where you see ^[, ^H, and ^M, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^> is literally a carat (^) and greater-than character, not Ctrl-<.

TERM=xterm was in use at the time.

To remove PCL codes, add patterns like this:

sed -e "s/^[[&()*][a-z]*[-+]*[0-9][0-9]*[A-Z]//" \
    -e "s/^[[=9EZYz]//" \
        file.dirty > file.clean

Ideally, if the regular expressions are used with an interpreter that understands the ? meta-character, the first pattern is better expressed as:

      "s/^[[&()*][a-z]?[-+]?[0-9][0-9]*[A-Z]//" \
kbulgrien
  • 4,384
  • 2
  • 26
  • 43
1

A bash snippet I've been using for stripping out (at least some) ANSI colors:

shopt -s extglob
while IFS='' read -r line; do
  echo "${line//$'\x1b'\[*([0-9;])[Km]/}"
done
rdesgroppes
  • 988
  • 12
  • 11
1

My answer to

What are these weird ha:// URLs jenkins fills our logs with?

removes all ANSI escape sequences from Jenkins console log files effectively (it also deals with Jenkins-specific URLs which wouldn't be relevant here).

I acknowledge and appreciate the contributions of Marius Gedminas and pyjama from this thread in formulating the ultimate solution.

Frank Hoeflich
  • 538
  • 2
  • 9
1

This simple awk solution worked for me, try this:

str="happy $(tput setaf 1)new$(tput sgr0) year!" #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1' #remove ansi colors