9

I am using this sed command to strip documents of all their (for me) unnecessary characters.

sed 's/[^a-zA-Z]/ /g'

However after mining my data a bit I realized a pretty basic mistake: not including ' cuts all my don'ts into don ts, which sucks.

So i want to include ' in my regex. I'm still new to this kind of "coding" if I may call it that way, so excuse my newbie mistake or even better, explain it to me!

sed 's/[^a-zA-Z']/ /g' this obviously doesn't work

sed 's/[^a-zA-Z\']/ /g' however this doesn't either, I thought \ escapes the '?

Jakob
  • 141
  • 1
  • 1
  • 9

2 Answers2

14

Good old double-quotes in action to protect the single quote without any need of escaping:

sed "s/[^a-zA-Z']/ /g" <<< "don't ... do this"

gives:

don't     do this

EDIT: your code seems to replace non-letters by space, but your question states otherwise, so I'm giving you the other version, to remove all non-letters/spaces and multiple occurrences of spaces as well (2nd expression).

sed -e "s/[^ a-zA-Z']//g" -e 's/ \+/ /' <<< "don't ... do this"

result:

don't do this

EDIT2: alternate solution to be able to keep single quotes (courtesy of Sundeep):

`'s/[^ a-zA-Z\x27]//g'`

Note: I first tried to escape single quotes following the solutions tested here and none using single quotes worked for me (always prompting for a line continuation) so I came up with those alternatives.

Community
  • 1
  • 1
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
4

You can also use tr -cd "'[:alnum:] "

$ echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:]"

$ somestring''''withoutspecialcharsexcept'

If you want the spaces:

echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:] "
some string '''' without special chars except '
Joel Griffiths
  • 161
  • 1
  • 4