How to remove the special characters shown as blue color in the picture 1 like: ^M, ^A, ^@, ^[. In my understanding, ^M is a windows newline character, I can use
sed -i '/^M//g'
to remove it, but it doesn't work to remove others. The command dos2unix
doesn't work, neither. Are there exist any ways that I can use to remove them both?
Asked
Active
Viewed 1.3e+01k times
19

codeforester
- 39,467
- 16
- 112
- 140

vinllen
- 1,369
- 2
- 18
- 36
4 Answers
35
Remove everything except the printable characters (character class [:print:]
), with sed
:
sed $'s/[^[:print:]\t]//g' file.txt
[:print:]
includes:
[:alnum:]
(alpha-numerics)[:punct:]
(punctuations)- space
The ANSI C quoting ($''
) is used for interpreting \t
as literal tab inside $''
(in bash
and alike).

heemayl
- 39,294
- 7
- 70
- 76
-
2I think `tr` would do this faster: `tr -d '[[^:print:]\t]' < file.txt` – chthonicdaemon Mar 30 '17 at 04:56
-
4@chthonicdaemon: Good idea, but you probable meant `tr -dC '[:print:]\t\n'` (can't use `^` with `tr`, and the outer `[]`would be taken as characters to match; also need to preserve `\n`). – mklement0 Mar 30 '17 at 12:53
-
1@heemayl: Thanks for updating; it's great to show a solution that works with BSD Sed too (and I would keep that solution (too)), but it's worth noting, given that the question is tagged `linux`, that a regular single-quoted string will do with _GNU_ Sed, which does understand `\t` natively. Alternatively, replacing `\t` with `[:blank:]` would bypass the issue. – mklement0 Mar 30 '17 at 12:56
10
To ensure that the command works with limited scope in Sed, force use of the "C" (POSIX) character classifications to avoid unpredictable behavior with non-ASCII characters:
LC_ALL=C sed 's/[^[:blank:][:print:]]//g' file.txt

mklement0
- 382,024
- 64
- 607
- 775

NeronLeVelu
- 9,908
- 1
- 23
- 43
-
2Good point, but just to state it explicitly: Your solution also removes non-ASCII _letters_, such as `é`. – mklement0 Mar 30 '17 at 13:18
-
1@mklement0 thanks for the correction and you got the point, that is the problem of which character is in or out of the scope. Only the OP could know because he know the context – NeronLeVelu Mar 31 '17 at 06:07
4
Try running below command on linux command prompt
Option - 1: (If dos2unix command is installed on Linux machine)
dos2unix sample_file.txt
Option - 2:
cat sample_file.txt | tr -d '\015' > new_sample_file.txt

Amit Kaneria
- 5,466
- 2
- 35
- 38
-
1Thank you for this but would you mind just helping me understand what "tr -d '\015' does? – josh Oct 06 '18 at 16:35
-
It deletes the character with octal code 015 which is carriage return in ASCII but could be something else in a different encoding. – RiverHeart Mar 10 '23 at 15:44
-1
Try this inside vi or vim:
or:
sed -e "s/^M//" filename > newfilename
Important: To enter ^M, type CTRL-V, then CTRL-M

Victor
- 31
- 2