21

I have three types of strings that I'd like to capitalize in a bash script. I figured sed/awk would be my best bet, but I'm not sure. What's the best way given the following requirements?

  1. single word
    e.g. taco -> Taco

  2. multiple words separated by hyphens
    e.g. my-fish-tacos -> My-Fish-Tacos

  3. multiple words separated by underscores
    e.g. my_fish_tacos -> My_Fish_Tacos

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
GregB
  • 5,465
  • 5
  • 23
  • 35

6 Answers6

33

There's no need to use capture groups (although & is a one in a way):

echo "taco my-fish-tacos my_fish_tacos" | sed 's/[^ _-]*/\u&/g'

The output:

Taco My-Fish-Tacos My_Fish_Tacos

The escaped lower case "u" capitalizes the next character in the matched sub-string.

Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • How would I modify this to handle words that are all-caps? For example my-FISH-TACOS should ouput My-Fish-Tacos. – GregB Aug 06 '12 at 06:09
  • 5
    @GregB: Tell it to lowercase all the characters then uppercase the next one: `sed 's/[^ _-]*/\L\u&/g'` – Dennis Williamson Aug 06 '12 at 10:32
  • 3
    Note: this is a GNU sed extension. BSD sed users (including OS X) can't do this. – Jashank Jeremy Mar 03 '14 at 03:56
  • @DennisWilliamson invite you in https://unix.stackexchange.com/questions/413562/replacing-different-string-with-different-new-string-that-follows-a-pattern/413591#comment742018_413591 – alhelal Dec 29 '17 at 14:16
8

Using awk:

echo 'test' | awk '{
     for ( i=1; i <= NF; i++) {
         sub(".", substr(toupper($i), 1,1) , $i);
         print $i;
         # or
         # print substr(toupper($i), 1,1) substr($i, 2);
     }
}'
Sergii Stotskyi
  • 5,134
  • 1
  • 22
  • 21
  • A bit of explanation about the example above: **NF** - Built-in awk variable comes from Number of fields (generally shows how many whitespace separated strings you have on a row) - in this example it will return 1 **substr** - returns substring, the declaration looks like this **substr(string, start, length)**. **sub** - substitute function - **sub(regex, replacement, target)** – Viktor Nonov Jan 12 '16 at 00:47
  • 2
    Note: it’s probably slightly more efficient to use `toupper(substr(...` instead of `substr(toupper(...`. – sam hocevar Apr 14 '16 at 17:15
6

Try the following:

sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g'

It works for me using GNU sed, but I don't think BSD sed supports \U and \L.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
3

Here is a solution that does not use the \u, that is not common to all seds.

Save this file into capitalize.sed, then run sed -i -f capitalize.sed FILE

s:^:.:
h
y/qwertyuiopasdfghjklzxcvbnm/QWERTYUIOPASDFGHJKLZXCVBNM/ 
G 
s:$:\n:
:r
/^.\n.\n/{s:::;p;d}
/^[^[:alpha:]][[:alpha:]]/ {
    s:.\(.\)\(.*\):x\2\1: 
    s:\n\(..\):\nx: 
    tr
}

/^[[:alpha:]][[:alpha:]]/ {
    s:\n.\(.\)\(.*\)$:\nx\2\1:
    s:..:x:
    tr
}
/^[^\n]/ {
    s:^.\(.\)\(.*\)$:.\2\1:
    s:\n..:\n.:
    tr
}
alinsoar
  • 15,386
  • 4
  • 57
  • 74
1

alinsoar's mind-blowing solution doesn't work at all in Plan9 sed, or correctly in busybox sed. But you should still try to figure out how it's supposed to do its thing: you will learn a lot about sed.

Here's a not-as-clever but easier to understand version which works in at least Plan9, busybox, and GNU sed (and probably BSD and MacOS). Plan9 sed needs backslashes removed in the match part of the s command.

#! /bin/sed -f

y/PYFGCRLAOEUIDHTNSQJKXBMWVZ/pyfgcrlaoeuidhtnsqjkxbmwvz/

s/\(^\|[^A-Za-z]\)a/\1A/g
s/\(^\|[^A-Za-z]\)b/\1B/g
s/\(^\|[^A-Za-z]\)c/\1C/g
s/\(^\|[^A-Za-z]\)d/\1D/g
s/\(^\|[^A-Za-z]\)e/\1E/g
s/\(^\|[^A-Za-z]\)f/\1F/g
s/\(^\|[^A-Za-z]\)g/\1G/g
s/\(^\|[^A-Za-z]\)h/\1H/g
s/\(^\|[^A-Za-z]\)i/\1I/g
s/\(^\|[^A-Za-z]\)j/\1J/g
s/\(^\|[^A-Za-z]\)k/\1K/g
s/\(^\|[^A-Za-z]\)l/\1L/g
s/\(^\|[^A-Za-z]\)m/\1M/g
s/\(^\|[^A-Za-z]\)n/\1N/g
s/\(^\|[^A-Za-z]\)o/\1O/g
s/\(^\|[^A-Za-z]\)p/\1P/g
s/\(^\|[^A-Za-z]\)q/\1Q/g
s/\(^\|[^A-Za-z]\)r/\1R/g
s/\(^\|[^A-Za-z]\)s/\1S/g
s/\(^\|[^A-Za-z]\)t/\1T/g
s/\(^\|[^A-Za-z]\)u/\1U/g
s/\(^\|[^A-Za-z]\)v/\1V/g
s/\(^\|[^A-Za-z]\)w/\1W/g
s/\(^\|[^A-Za-z]\)x/\1X/g
s/\(^\|[^A-Za-z]\)y/\1Y/g
s/\(^\|[^A-Za-z]\)z/\1Z/g
Neale Pickett
  • 196
  • 1
  • 4
0

This might work for you (GNU sed):

echo "aaa bbb ccc aaa-bbb-ccc aaa_bbb_ccc aaa-bbb_ccc"  | sed 's/\<.\|_./\U&/g'
Aaa Bbb Ccc Aaa-Bbb-Ccc Aaa_Bbb_Ccc Aaa-Bbb_Ccc
potong
  • 55,640
  • 6
  • 51
  • 83