19

vim pic How to remove the special characters shown as blue color in the picture 1 like: ^M, ^A, ^@, ^[. In my understanding, ^M is a windows newline character, I can use sed -i '/^M//g' to remove it, but it doesn't work to remove others. The command dos2unix doesn't work, neither. Are there exist any ways that I can use to remove them both?

codeforester
  • 39,467
  • 16
  • 112
  • 140
vinllen
  • 1,369
  • 2
  • 18
  • 36

4 Answers4

35

Remove everything except the printable characters (character class [:print:]), with sed:

sed $'s/[^[:print:]\t]//g' file.txt

[:print:] includes:

  • [:alnum:] (alpha-numerics)
  • [:punct:] (punctuations)
  • space

The ANSI C quoting ($'') is used for interpreting \t as literal tab inside $'' (in bash and alike).

heemayl
  • 39,294
  • 7
  • 70
  • 76
  • 2
    I think `tr` would do this faster: `tr -d '[[^:print:]\t]' < file.txt` – chthonicdaemon Mar 30 '17 at 04:56
  • 4
    @chthonicdaemon: Good idea, but you probable meant `tr -dC '[:print:]\t\n'` (can't use `^` with `tr`, and the outer `[]`would be taken as characters to match; also need to preserve `\n`). – mklement0 Mar 30 '17 at 12:53
  • 1
    @heemayl: Thanks for updating; it's great to show a solution that works with BSD Sed too (and I would keep that solution (too)), but it's worth noting, given that the question is tagged `linux`, that a regular single-quoted string will do with _GNU_ Sed, which does understand `\t` natively. Alternatively, replacing `\t` with `[:blank:]` would bypass the issue. – mklement0 Mar 30 '17 at 12:56
10

To ensure that the command works with limited scope in Sed, force use of the "C" (POSIX) character classifications to avoid unpredictable behavior with non-ASCII characters:

LC_ALL=C sed 's/[^[:blank:][:print:]]//g' file.txt
mklement0
  • 382,024
  • 64
  • 607
  • 775
NeronLeVelu
  • 9,908
  • 1
  • 23
  • 43
  • 2
    Good point, but just to state it explicitly: Your solution also removes non-ASCII _letters_, such as `é`. – mklement0 Mar 30 '17 at 13:18
  • 1
    @mklement0 thanks for the correction and you got the point, that is the problem of which character is in or out of the scope. Only the OP could know because he know the context – NeronLeVelu Mar 31 '17 at 06:07
4

Try running below command on linux command prompt

Option - 1: (If dos2unix command is installed on Linux machine)

dos2unix sample_file.txt

Option - 2:

cat sample_file.txt | tr -d '\015' > new_sample_file.txt
Amit Kaneria
  • 5,466
  • 2
  • 35
  • 38
  • 1
    Thank you for this but would you mind just helping me understand what "tr -d '\015' does? – josh Oct 06 '18 at 16:35
  • It deletes the character with octal code 015 which is carriage return in ASCII but could be something else in a different encoding. – RiverHeart Mar 10 '23 at 15:44
-1

Try this inside vi or vim:

[in ESC mode] type: :%s/^M//g

or:

sed -e "s/^M//" filename > newfilename

Important: To enter ^M, type CTRL-V, then CTRL-M

Victor
  • 31
  • 2