53

I'm looking for the best way to take a simple input:

echo -n "Enter a string here: "
read -e STRING

and clean it up by removing non-alphanumeric characters, lower(case), and replacing spaces with underscores.

Does order matter? Is tr the best / only way to go about this?

Devin Reams
  • 972
  • 1
  • 7
  • 15

5 Answers5

55

As dj_segfault points out, the shell can do most of this for you. Looks like you'll have to fall back on something external for lower-casing the string, though. For this you have many options, like the perl one-liners above, etc., but I think tr is probably the simplest.

# first, strip underscores
CLEAN=${STRING//_/}
# next, replace spaces with underscores
CLEAN=${CLEAN// /_}
# now, clean out anything that's not alphanumeric or an underscore
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
# finally, lowercase with TR
CLEAN=`echo -n $CLEAN | tr A-Z a-z`

The order here is somewhat important. We want to get rid of underscores, plus replace spaces with underscores, so we have to be sure to strip underscores first. By waiting to pass things to tr until the end, we know we have only alphanumeric and underscores, and we can be sure we have no spaces, so we don't have to worry about special characters being interpreted by the shell.

Thomee
  • 4,512
  • 2
  • 21
  • 12
  • 2
    Note to reader: If you are having trouble making this work, check your shebang to see if you're calling bash or sh, and how your system interprets 'sh'. – JD. Nov 06 '12 at 18:40
  • 2
    As of Bash 4, it can do case modification also. `lowercase=${CLEAN,,}` [Bash Hackers Wiki](http://wiki.bash-hackers.org/syntax/pe) explains parameter expansions in a more *human-readable* way than man pages. – toxalot Mar 17 '14 at 20:54
  • Nice work. I wasn't previously aware of these shell features. Thanks! I just discovered that zsh allows you to actually *nest* all of these, so you can do it in one line: `echo -n ${${${str//_/}// /_}//[^a-zA-Z0-9_]/} | tr A-Z a-z` ..not that I would recommend putting something that incomprehensible in a script. :) (edit: formatting) – Jon Carter Jul 26 '15 at 17:23
  • very nice. It may need also a : LC_ALL=C before all the a-z A-Z invocations to be sure it doesn't leave any weird things (depending on your locale, or someone else's locale, a-z, A-Z, and maybe even 0-9 can mean a lot of different things...) – Olivier Dulac May 31 '21 at 07:57
45

Bash can do this all on it's own, thank you very much. If you look at the section of the man page on Parameter Expansion, you'll see that that bash has built-in substitutions, substring, trim, rtrim, etc.

To eliminate all non-alphanumeric characters, do

CLEANSTRING=${STRING//[^a-zA-Z0-9]/}

That's Occam's razor. No need to launch another process.

dj_segfault
  • 11,957
  • 4
  • 29
  • 37
4

For Bash >= 4.0:

CLEAN="${STRING//_/}" && \
CLEAN="${CLEAN// /_}" && \
CLEAN="${CLEAN//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"

This is especially useful for creating container names programmatically using docker/podman. However, in this case you'll also want to remove the underscores:

# Sanitize $STRING for a container name
CLEAN="${STRING//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"
0

After a bit of looking around it seems tr is indeed the simplest way:

export CLEANSTRING="`echo -n "${STRING}" | tr -cd '[:alnum:] [:space:]' | tr '[:space:]' '-'  | tr '[:upper:]' '[:lower:]'`"

Occam's razor, I suppose.

Devin Reams
  • 972
  • 1
  • 7
  • 15
  • 1
    if you set the `STRING=$(rm /tmp/*)`, if you echo the $STRING before cleaning, it will execute the sub-shell and remove your /tmp/ content... so you need to sanitize it BEFORE any echo is done – higuita May 04 '16 at 16:29
0

You could run it through perl.

export CLEANSTRING=$(perl -e 'print join( q//, map { s/\\s+/_/g; lc } split /[^\\s\\w]+/, \$ENV{STRING} )')

I'm using ksh-style subshell here, I'm not totally sure that it works in bash.

That's the nice thing about shell, is that you can use perl, awk, sed, grep....

Axeman
  • 29,660
  • 2
  • 47
  • 102