Capitalize strings in sed or awk

Question

I have three types of strings that I'd like to capitalize in a bash script. I figured sed/awk would be my best bet, but I'm not sure. What's the best way given the following requirements?

single word
e.g. taco -> Taco
multiple words separated by hyphens
e.g. my-fish-tacos -> My-Fish-Tacos
multiple words separated by underscores
e.g. my_fish_tacos -> My_Fish_Tacos

score 33 · Accepted Answer · answered Aug 03 '12 at 21:36

33

There's no need to use capture groups (although & is a one in a way):

echo "taco my-fish-tacos my_fish_tacos" | sed 's/[^ _-]*/\u&/g'

The output:

Taco My-Fish-Tacos My_Fish_Tacos

The escaped lower case "u" capitalizes the next character in the matched sub-string.

answered Aug 03 '12 at 21:36

Dennis Williamson

346,391
90
374
439

How would I modify this to handle words that are all-caps? For example my-FISH-TACOS should ouput My-Fish-Tacos. – GregB Aug 06 '12 at 06:09
5

@GregB: Tell it to lowercase all the characters then uppercase the next one: `sed 's/[^ _-]*/\L\u&/g'` – Dennis Williamson Aug 06 '12 at 10:32
3

Note: this is a GNU sed extension. BSD sed users (including OS X) can't do this. – Jashank Jeremy Mar 03 '14 at 03:56
@DennisWilliamson invite you in https://unix.stackexchange.com/questions/413562/replacing-different-string-with-different-new-string-that-follows-a-pattern/413591#comment742018_413591 – alhelal Dec 29 '17 at 14:16

score 8 · Answer 2 · answered Aug 03 '12 at 21:33

8

Using awk:

echo 'test' | awk '{
     for ( i=1; i <= NF; i++) {
         sub(".", substr(toupper($i), 1,1) , $i);
         print $i;
         # or
         # print substr(toupper($i), 1,1) substr($i, 2);
     }
}'

answered Aug 03 '12 at 21:33

Sergii Stotskyi

5,134
1
22
21

A bit of explanation about the example above: **NF** - Built-in awk variable comes from Number of fields (generally shows how many whitespace separated strings you have on a row) - in this example it will return 1 **substr** - returns substring, the declaration looks like this **substr(string, start, length)**. **sub** - substitute function - **sub(regex, replacement, target)** – Viktor Nonov Jan 12 '16 at 00:47
2

Note: it’s probably slightly more efficient to use `toupper(substr(...` instead of `substr(toupper(...`. – sam hocevar Apr 14 '16 at 17:15

score 6 · Answer 3 · answered Aug 03 '12 at 21:23

6

Try the following:

sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g'

It works for me using GNU sed, but I don't think BSD sed supports \U and \L.

answered Aug 03 '12 at 21:23

Andrew Clark

202,379
35
273
306

alinsoar · Answer 4 · 2016-12-25T12:13:03.527

Here is a solution that does not use the \u, that is not common to all seds.

Save this file into capitalize.sed, then run sed -i -f capitalize.sed FILE

s:^:.:
h
y/qwertyuiopasdfghjklzxcvbnm/QWERTYUIOPASDFGHJKLZXCVBNM/ 
G 
s:$:\n:
:r
/^.\n.\n/{s:::;p;d}
/^[^[:alpha:]][[:alpha:]]/ {
    s:.\(.\)\(.*\):x\2\1: 
    s:\n\(..\):\nx: 
    tr
}

/^[[:alpha:]][[:alpha:]]/ {
    s:\n.\(.\)\(.*\)$:\nx\2\1:
    s:..:x:
    tr
}
/^[^\n]/ {
    s:^.\(.\)\(.*\)$:.\2\1:
    s:\n..:\n.:
    tr
}

Neale Pickett · Answer 5 · 2014-02-07T00:15:35.207

alinsoar's mind-blowing solution doesn't work at all in Plan9 sed, or correctly in busybox sed. But you should still try to figure out how it's supposed to do its thing: you will learn a lot about sed.

Here's a not-as-clever but easier to understand version which works in at least Plan9, busybox, and GNU sed (and probably BSD and MacOS). Plan9 sed needs backslashes removed in the match part of the s command.

#! /bin/sed -f

y/PYFGCRLAOEUIDHTNSQJKXBMWVZ/pyfgcrlaoeuidhtnsqjkxbmwvz/

s/\(^\|[^A-Za-z]\)a/\1A/g
s/\(^\|[^A-Za-z]\)b/\1B/g
s/\(^\|[^A-Za-z]\)c/\1C/g
s/\(^\|[^A-Za-z]\)d/\1D/g
s/\(^\|[^A-Za-z]\)e/\1E/g
s/\(^\|[^A-Za-z]\)f/\1F/g
s/\(^\|[^A-Za-z]\)g/\1G/g
s/\(^\|[^A-Za-z]\)h/\1H/g
s/\(^\|[^A-Za-z]\)i/\1I/g
s/\(^\|[^A-Za-z]\)j/\1J/g
s/\(^\|[^A-Za-z]\)k/\1K/g
s/\(^\|[^A-Za-z]\)l/\1L/g
s/\(^\|[^A-Za-z]\)m/\1M/g
s/\(^\|[^A-Za-z]\)n/\1N/g
s/\(^\|[^A-Za-z]\)o/\1O/g
s/\(^\|[^A-Za-z]\)p/\1P/g
s/\(^\|[^A-Za-z]\)q/\1Q/g
s/\(^\|[^A-Za-z]\)r/\1R/g
s/\(^\|[^A-Za-z]\)s/\1S/g
s/\(^\|[^A-Za-z]\)t/\1T/g
s/\(^\|[^A-Za-z]\)u/\1U/g
s/\(^\|[^A-Za-z]\)v/\1V/g
s/\(^\|[^A-Za-z]\)w/\1W/g
s/\(^\|[^A-Za-z]\)x/\1X/g
s/\(^\|[^A-Za-z]\)y/\1Y/g
s/\(^\|[^A-Za-z]\)z/\1Z/g

score 0 · Answer 6 · answered Aug 04 '12 at 07:05

0

This might work for you (GNU sed):

echo "aaa bbb ccc aaa-bbb-ccc aaa_bbb_ccc aaa-bbb_ccc"  | sed 's/\<.\|_./\U&/g'
Aaa Bbb Ccc Aaa-Bbb-Ccc Aaa_Bbb_Ccc Aaa-Bbb_Ccc

answered Aug 04 '12 at 07:05

potong

55,640
6
51
83

Capitalize strings in sed or awk

6 Answers6

Linked