Regex capitalize first letter every word, also after a special character like a dash

Question

I use this to capitalize every first letter every word:

#(\s|^)([a-z0-9-_]+)#i

I want it also to capitalize the letter if it's after a special mark like a dash (-).

Now it shows:

This Is A Test For-stackoverflow

And I want this:

This Is A Test For-Stackoverflow

Do you also need to capitalize non-ASCII letters (`à`, `ü` etc.)? What language are you using? — Tim Pietzcker, Jun 06 '11 at 13:10

score 36 · Answer 1 · answered Apr 24 '15 at 23:00

36

+1 for word boundaries, and here is a comparable Javascript solution. This accounts for possessives, as well:

var re = /(\b[a-z](?!\s))/g;
var s = "fort collins, croton-on-hudson, harper's ferry, coeur d'alene, o'fallon"; 
s = s.replace(re, function(x){return x.toUpperCase();});
console.log(s); // "Fort Collins, Croton-On-Hudson, Harper's Ferry, Coeur D'Alene, O'Fallon"

answered Apr 24 '15 at 23:00

NotNedLudd

369
3
2

toUpperCase is capitalizing the whole word. Here is the solution: s.replace(re, function(x){return x.charAt(0).toUpperCase() + x.slice(1);}); – Polopollo May 09 '16 at 20:26
2

@Polopollo, in this case the regex is only returning one letter if it matches but globally. So there is no need for that extra coding and it should work as is. – adam-beck Apr 26 '17 at 19:51
This will not work as OP has asked since a single character would not get capitalized. Just for anybody who comes to this question like I did. – adam-beck Apr 26 '17 at 19:51
2

I fear this doesn't work: word boundaries include things like '. So `don't` becomes `Don'T` – Anderas Apr 13 '18 at 05:28
@Anderas that's what the negative lookahead is for: `(?!\s)` checks if it's not a character before whitespace. On the other hand, this fails when a word like `don't` is followed by a non-whitespace, non-alphanumeric character like a comma, period or exclamation mark. It would be better to use a word boundary in the lookahead: `/(\b[a-z](?!\b))/g;` – Guido Bouman May 03 '18 at 12:22
@GuidoBouman: Your suggested regex fails for Coeur D'Alene and O'Fallon though. – davemyron May 23 '19 at 00:56

score 20 · Answer 2 · answered Jun 06 '11 at 11:42

20

A simple solution is to use word boundaries:

#\b[a-z0-9-_]+#i

Alternatively, you can match for just a few characters:

#([\s\-_]|^)([a-z0-9-_]+)#i

answered Jun 06 '11 at 11:42

Kobi

135,331
41
252
292

2

@Tim - I took artistic freedom and didn't change the way the OP matches letters - It's *possible* Simmer wants the letter as output, change their colors or whatnot. Also, didn't gave it that much thought, I only had 4 minutes `:P` – Kobi Jun 06 '11 at 14:35
1

Can someone please add jsfiddle example would be helpful – Pravin W Jun 09 '16 at 10:33
1

Which language's regex is this for? – JohnK Jun 22 '17 at 15:32
@JohnK - Both of these are simple enough and should work in all languages. `#` is a separator here, so your language may need `"\\b[a-z0-9-_]+"` and an `IgnoreCase` flag. – Kobi Jun 22 '17 at 15:44

score 11 · Answer 3 · answered Dec 17 '20 at 16:34

11

If you want to use pure regular expressions you must use the \u.

To transform this string:

This Is A Test For-stackoverflow

into

This Is A Test For-Stackoverflow

You must put: (.+)-(.+) to capture the values before and after the "-" then to replace it you must put:

$1-\u$2

If it is in bash you must put:

echo "This Is A Test For-stackoverflow" | sed 's/$.$-$.$/\1-\u\2/'

answered Dec 17 '20 at 16:34

Jaime Roman

749
1
11
26

This works amazingly, thank you. However, could you please link so documentation as to how \u flag actually works and how does it capitalize the first letter of the second group? – Harshal Feb 08 '23 at 07:08
1

@Harshal, I don't understand your question, but I'll try to answer. With the "\u" flag you are saying that you want to capitalize the first character of the string that follows. if you use the expression (.+)-(.+) the value $2 is what corresponds to what is inside the second parenthesis, that's why "u\$2" is capitalizing the first letter, in this example it is "S" by stackoverflow – Jaime Roman Feb 08 '23 at 07:31
Thank you, to be more precise, I was curious about any documentation which says, "\u" does the capitalisation of first letter. It works as expected, I just wanted to read more about this flag. – Harshal Feb 09 '23 at 06:35

score 8 · Answer 4 · answered Jun 06 '11 at 11:59

8

Actually dont need to match full string just match the first non-uppercase letter like this:

'~\b([a-z])~'

answered Jun 06 '11 at 11:59

anubhava

761,203
64
569
643

3

in js, i've added `g` like `/\b([a-z])/g` to capitalize each word – Stalin Gino Dec 06 '14 at 07:53
1

i like your lovely answer @StalinGino must say this is the only one i was able to understand. – Danish Feb 08 '16 at 11:38
That is as per the requirements. Check all other answers as well. – anubhava May 24 '20 at 04:08

score 4 · Answer 5 · answered May 23 '20 at 18:46

For JavaScript, here’s a solution that works across different languages and alphabets:

const originalString = "this is a test for-stackoverflow"
const processedString = originalString.replace(/(?:^|\s|[-"'([{])+\S/g, (c) => c.toUpperCase())

It matches any non-whitespace character \S that is preceded by a the start of the string ^, whitespace \s, or any of the characters -"'([{, and replaces it with its uppercase variant.

score 3 · Answer 6 · answered Jan 22 '21 at 22:35

my solution using javascript

function capitalize(str) {
  var reg = /\b([a-zÁ-ú]{3,})/g;
  return string.replace(reg, (w) => w.charAt(0).toUpperCase() + w.slice(1));
}

with es6 + javascript

const capitalize = str => 
    str.replace(/\b([a-zÁ-ú]{3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));



/<expression-here>/g

[a-zÁ-ú] here I consider all the letters of the alphabet, including capital letters and with accentuation. ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
[a-zÁ-ú]{3,} so I'm going to remove some letters that are not big enough
ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
\b([a-zÁ-ú]{3,}) lastly i keep only words that complete which are selected. Have to use () to isolate the last expression to work.
ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas

after achieving this, I apply the changes only to the words that are in lower case

string.charAt(0).toUpperCase() + w.slice(1); // output -> Output

joining the two

str.replace(/\b(([a-zÁ-ú]){3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));

result:
Sábado de Janeiro às 19h. Sexta-Feira de Janeiro às 21 e Horas

score 1 · Answer 7 · edited Jun 08 '22 at 14:53

1

Python solution:

>>> import re
>>> the_string = 'this is a test for stack-overflow'
>>> re.sub(r'(((?<=\s)|^|-)[a-z])', lambda x: x.group().upper(), the_string)
'This Is A Test For Stack-Overflow'

read about the "positive lookbehind"

edited Jun 08 '22 at 14:53

DᴀʀᴛʜVᴀᴅᴇʀ

7,681
17
73
127

answered Jan 18 '19 at 21:02

nmz787

1,960
1
21
35

score 1 · Answer 8 · answered Jun 08 '22 at 15:34

While this answer for a pure Regular Expression solution is accurate:

echo "This Is A Test For-stackoverflow" | sed 's/\(.\)-\(.\)/\1-\u\2/'

it should be noted when using any Case-Change Operators:

\l            Change case of only the first character to the right lower case. (Note: lowercase 'L')
\L            Change case of all text to the right to lowercase.
\u            Change case of only the first character to the right to uppercase.
\U            Change case of all text to the right to uppercase.

the end delimiter should be used:

\E

so the end result should be:

echo "This Is A Test For-stackoverflow" | sed 's/\(.\)-\(.\)/\1-\u\E\2/'

Sedecimdies · Answer 9 · 2013-09-17T10:11:04.520

this will make

R.E.A.C De Boeremeakers

from

r.e.a.c de boeremeakers

(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])

using

    Dim matches As MatchCollection = Regex.Matches(inputText, "(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])")
    Dim outputText As New StringBuilder
    If matches(0).Index > 0 Then outputText.Append(inputText.Substring(0, matches(0).Index))
    index = matches(0).Index + matches(0).Length
    For Each Match As Match In matches
        Try
            outputText.Append(UCase(Match.Value))
            outputText.Append(inputText.Substring(Match.Index + 1, Match.NextMatch.Index - Match.Index - 1))
        Catch ex As Exception
            outputText.Append(inputText.Substring(Match.Index + 1, inputText.Length - Match.Index - 1))
        End Try
    Next

Regex capitalize first letter every word, also after a special character like a dash

9 Answers9

Linked