How to remove ^[, and all of the escape sequences in a file using linux shell scripting

Question

We want to remove ^[, and all of the escape sequences.

sed is not working and is giving us this error:

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command

Are you looking for two characters, caret `^` and open square bracket `[`, or are you looking for one character, control-[ (ASCII ESCAPE, 0x1B)? Are you looking to remove the terminal control sequences that follow the ESC character? If so, that is a complex job, and ultimately requires you to know which terminal the control codes were generated for - different terminal types use different control sequences, and for a single terminal type, different commands have different numbers of following characters. — Jonathan Leffler, Jun 30 '11 at 14:30
This is not _such_ a difficult task - it depends in part on the context. — Graham Nicholls, Sep 27 '18 at 08:39

score 69 · Answer 1 · edited Nov 06 '18 at 20:32

69

Are you looking for ansifilter?

Two things you can do: enter the literal escape (in bash:)

Using keyboard entry:

sed 's/Ctrl-vEsc//g'

alternatively

sed 's/Ctrl-vCtrl-[//g'

Or you can use character escapes:

sed 's/\x1b//g'

or for all control characters:

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!

edited Nov 06 '18 at 20:32

The Guy with The Hat

10,836
8
57
75

answered Jun 30 '11 at 12:26

sehe

374,641
47
450
633

also asked: https://stackoverflow.com/questions/17998978/removing-colors-from-output See ansifilter source: https://gitlab.com/saalen/ansifilter/ – mike Oct 22 '22 at 23:06

score 59 · Answer 2 · answered Apr 26 '17 at 07:37

59

commandlinefu gives the correct answer which strips ANSI colours as well as movement commands:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"

answered Apr 26 '17 at 07:37

Tom Hale

40,825
36
187
242

3

This works with gnu sed, but is not portable to other sed implementations (e.g., bsd) - because of the \x1B. For other seds, you can use the raw escape character (you can use the ctrl-v prefix to insert a literal escape character on the command line). – Juan Mar 06 '18 at 16:24
11

Bash also lets you say `sed $'s,\x1B\[[0-9;]*[a-zA-Z],,g'` where the dollar sign before the single quote is significant (it produces a "C-style" string). – tripleee Feb 07 '20 at 09:39
1

@tripleee - thanks! this helped for macos(bsd) sed. Extending your example a bit further: sed $'s,[\x01-\x1F\x7F]\[[0-9;]*[a-zA-Z],,g" handles all escape sequences – Goblinhack May 18 '20 at 20:20
I found this sed substitution left `^[(B` in /var/log/dnf.log output. The one by AGipson worked better for me. – skierpage Jun 13 '21 at 02:28
`\x1B\[[?0-9;]*[a-zA-Z]` with `IGNORECASE` did the trick. As escape codes can be `\x1B[?25` for instance, the `?` being notable here. – Torxed Jan 30 '22 at 21:26

score 22 · Answer 3 · edited Apr 15 '21 at 02:00

22

I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:

sed -r s/\x1b\[[0-9;]*m?//g

This removes m commands, but for all escapes (as commented by @lethalman) use:

sed -r s/\x1b\[[^@-~]*[@-~]//g

Also see "https://stackoverflow.com/questions/7857352/python-regex-to-match-vt100-escape-sequences".

There is also a table of common escape sequences.

edited Apr 15 '21 at 02:00

Thaodan

107
1
3
10

answered Jun 03 '14 at 01:01

Luke H

3,125
27
31

1

That only escapes the `m` command. This should be more generic `\x1b\[[^@-~]*[@-~]` – lethalman Sep 02 '15 at 16:13
I specifically mentioned that it isn't generic— "...but this doesn't include all possible ANSI escapes..." – Luke H Sep 03 '15 at 08:30
1

The `[^@-~]*[@-~]` didn't work for me; I needed `[^A-Za-z]*[A-Za-z]` (which seems to match all the required characters in the table) – David Fraser Aug 16 '17 at 10:19
2

Note that on BSD (Mac OS X) sed doesn’t support ANSI-C escape sequences like `\x1b`. So in these environments one might need to lean on the shell a bit by having it expand the escape byte: `sed 's/'"$(printf '\x1b')"'\[[^@-~]*[@-~]//g'` — Tested on both BSD and GNU sed in bash4, seems to work fine. – Mark G. May 08 '18 at 20:08
In my situation of trying to delete the escape sequences in `unbuffer yum search`, I had to do `sed 's/\x1b$\[\|($[^A-Za-z]*[A-Za-z]//g'` (I couldn't get the syntax to work with `-r`). In addition to the adjustment by @DavidFraser, I not only had to remove stuff that started with `\x1b\[` but also stuff that started with `\x1b(`. – Levi Uzodike Sep 02 '21 at 19:12

score 18 · Answer 4 · answered May 01 '15 at 16:16

18

ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.

answered May 01 '15 at 16:16

soorajmr

520
4
9

1

So `ansi2txt` appears to not to strip bold characters, whereas the answer using `col -b` listed below (perversely) does. Here is a test case to demonstrate this: `diff <(man -Tutf8 tmux | col -b | head | hd) <(man -Tutf8 tmux | ansi2txt | head | hd) ` – Att Righ Mar 08 '17 at 15:27
3

It looks like piping `ansi2txt` to `col -b` is necessary to remove everything. – Marius Gedminas Jun 27 '19 at 12:43
When piping `ansi2txt` to `col -b` you may want to use `col -xb` to prevent spaces getting replaced by tabs. But then tabs will be replaced with spaces. – Daniel F Dec 30 '21 at 07:20

score 12 · Answer 5 · answered Jun 26 '18 at 04:09

12

I don't have enough reputation to add a comment to the answer given by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.

sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'

answered Jun 26 '18 at 04:09

AGipson

161
1
5

1

This worked for Fedora's /var/log/dnf.log output, Tom Hale's answer left `^[(B` in the output. – skierpage Jun 13 '21 at 02:19

gronostaj · Answer 6 · 2020-07-30T06:54:49.343

I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS).

For that task the correct command would be col -bx, for example:

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

(source)

Why this works: (in response to a comment by @AttRigh)

groff produces bold characters like you would on a typewriter: print a letter, move one character back with backspace (you can't erase text on a typewriter), print the same letter again to make the character more pronounced. So simply omitting backspaces produces "SSYYNNOOPPSSIISS". col -b fixes this by interpreting backspaces correctly, quote from the manual:

-b Do not output any backspaces, printing only the last character written to each column position.

It seems to be the `col -b` option that does this. The documentation says that this removes backspaces characters :/, go figure. It is nevertheless the most compact option that I could find that doesn't require one to install things (outside of one's package manager) — Att Righ, Mar 08 '17 at 15:31
`i++` for this. Don't reinvent this wheel, folks. See also `colcrt` — tripleee, Jun 19 '18 at 05:48

score 10 · Answer 7 · answered Nov 06 '18 at 10:53

10

You can remove all non printable characters with this:

sed 's/[^[:print:]]//g'

answered Nov 06 '18 at 10:53

pyjama

109
1
2

1

On Mac, using sed, this is the only answer that worked to remove the `\x1b` ascii escape characters. – Davos Aug 15 '19 at 02:37
3

But this only removes the invisible characters; so something like `^[[0;31m` will simply turn into `[0;31m`. – tripleee Feb 07 '20 at 09:38
@tripleee you can add `....` and remove them. `'s/[^[:print:]]....//g'` – rth Mar 21 '20 at 19:06
3

@rth Unclear what you are proposing; trimming exactly four characters is wrong because the escape sequences are different lengths. You'd have to write an escape sequence parser to know how many to remove. – tripleee Mar 23 '20 at 05:20
@tripleee Ah yes, agree. But I was just looking for a simple solution to remove color sequences from an output, and for this primitive purpose, it works. Of course, it doesn't work for more general cases. Thank you for catching this. – rth Mar 23 '20 at 19:26
1

Even then, you can't know whether it's a single-digit or double-digit number, as evidenced by the earlier example. You could wing it with something like `s/[^[:print;]]\[[0-9;]*[A-Za-z]//g` but I'm not sure that's entirely correct either. – tripleee Mar 24 '20 at 04:43
If all you want to do is remove non-printables, you can also use `tr -dc '[:print:]'`. – Steen Schütt May 27 '21 at 08:45
So, I had some success on MAC with `sed 's/[^[:print:]]\[[0-9;]*[a-zA-Z]//g' old.txt > new.txt` – Yasin Zähringer Aug 20 '21 at 16:16

score 6 · Answer 8 · answered May 04 '17 at 06:53

I built vtclean for this. It strips escape sequences using these regular expressions in order (explained in regex.txt):

// handles long-form RGB codes
^\033](\d+);([^\033]+)\033\\

// excludes non-movement/color codes
^\033(\[[^a-zA-Z0-9@\?]+|[\(\)]).

// parses movement and color codes
^\033([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.

score 3 · Answer 9 · answered Mar 14 '15 at 17:41

Just a note; let's say you have a file like this (such line endings are generated by git remote reports):

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

In binary, this looks like this:

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

It is visible that git here adds the sequence 0x1b 0x5b 0x4b before the line ending (0x0a).

Note that - while you can match the 0x1b with a literal format \x1b in sed, you CANNOT do the same for 0x5b, which represents the left square bracket [:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

You might think you can escape the representation with an extra backslash \ - which ends up as \\x5b; but while that "passes" - it doesn't match anything as intended:

$ cat chartest.txt | sed 's/\x1b\\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

So if you want to match this character, apparently you must write it as escaped left square bracket, that is \[ - the rest of the values can than be entered with escaped \x notation:

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a

score 3 · Answer 10 · answered Nov 26 '19 at 04:16

3

sed based approach without extended regular expressions enabled by -r

sed 's/\x1B\[[0-9;]*[JKmsu]//g'

answered Nov 26 '19 at 04:16

palik

2,425
23
31

1

That one is filtering nicely complex escape codes llike:`\033[38;2;255;255;255m ` where even iconv `iconv -f "ASCII" -t "UTF-8"` is failing. Thanks for posting – NVRM Nov 26 '19 at 06:20
1

first 4 solution didn't work, but this one does! – Bogdan Mart Oct 13 '21 at 08:53

kbulgrien · Answer 11 · 2023-04-05T21:36:21.630

Tom Hale's answer left unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:

sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
    -e "s/^[[[][0-9][0-9]*[@]//" \
    -e "s/^[[=0-9]<[^>]*>//" \
    -e "s/^[[)][0-9]//" \
    -e "s/.^H//g" \
    -e "s/^M//g" \
    -e "s/^^H//" \
        file.dirty > file.clean

As this was done on a non-GNU version of sed, where you see ^[, ^H, and ^M, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^> is literally a carat (^) and greater-than character, not Ctrl-<.

TERM=xterm was in use at the time.

To remove PCL codes, add patterns like this:

sed -e "s/^[[&()*][a-z]*[-+]*[0-9][0-9]*[A-Z]//" \
    -e "s/^[[=9EZYz]//" \
        file.dirty > file.clean

Ideally, if the regular expressions are used with an interpreter that understands the ? meta-character, the first pattern is better expressed as:

      "s/^[[&()*][a-z]?[-+]?[0-9][0-9]*[A-Z]//" \

score 1 · Answer 12 · answered Apr 26 '19 at 17:34

1

A bash snippet I've been using for stripping out (at least some) ANSI colors:

shopt -s extglob
while IFS='' read -r line; do
  echo "${line//$'\x1b'\[*([0-9;])[Km]/}"
done

answered Apr 26 '19 at 17:34

rdesgroppes

988
12
11

Frank Hoeflich · Answer 13 · 2020-06-24T17:50:40.250

1

My answer to

What are these weird ha:// URLs jenkins fills our logs with?

removes all ANSI escape sequences from Jenkins console log files effectively (it also deals with Jenkins-specific URLs which wouldn't be relevant here).

I acknowledge and appreciate the contributions of Marius Gedminas and pyjama from this thread in formulating the ultimate solution.

edited Jun 24 '20 at 17:50

answered Jun 19 '20 at 23:13

Frank Hoeflich

538
2
9

Isuru Sampath · Answer 14 · 2021-11-28T11:26:48.060

1

This simple awk solution worked for me, try this:

str="happy $(tput setaf 1)new$(tput sgr0) year!" #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1' #remove ansi colors

edited Nov 28 '21 at 11:26

answered Nov 28 '21 at 11:20

Isuru Sampath

11
3

How to remove ^[, and all of the escape sequences in a file using linux shell scripting

14 Answers14

Linked

Related