42

I use this to capitalize every first letter every word:

#(\s|^)([a-z0-9-_]+)#i

I want it also to capitalize the letter if it's after a special mark like a dash (-).

Now it shows:

This Is A Test For-stackoverflow

And I want this:

This Is A Test For-Stackoverflow

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Simmer
  • 421
  • 1
  • 4
  • 5

9 Answers9

36

+1 for word boundaries, and here is a comparable Javascript solution. This accounts for possessives, as well:

var re = /(\b[a-z](?!\s))/g;
var s = "fort collins, croton-on-hudson, harper's ferry, coeur d'alene, o'fallon"; 
s = s.replace(re, function(x){return x.toUpperCase();});
console.log(s); // "Fort Collins, Croton-On-Hudson, Harper's Ferry, Coeur D'Alene, O'Fallon"
NotNedLudd
  • 369
  • 3
  • 2
  • toUpperCase is capitalizing the whole word. Here is the solution: s.replace(re, function(x){return x.charAt(0).toUpperCase() + x.slice(1);}); – Polopollo May 09 '16 at 20:26
  • 2
    @Polopollo, in this case the regex is only returning one letter if it matches but globally. So there is no need for that extra coding and it should work as is. – adam-beck Apr 26 '17 at 19:51
  • This will not work as OP has asked since a single character would not get capitalized. Just for anybody who comes to this question like I did. – adam-beck Apr 26 '17 at 19:51
  • 2
    I fear this doesn't work: word boundaries include things like '. So `don't` becomes `Don'T` – Anderas Apr 13 '18 at 05:28
  • @Anderas that's what the negative lookahead is for: `(?!\s)` checks if it's not a character before whitespace. On the other hand, this fails when a word like `don't` is followed by a non-whitespace, non-alphanumeric character like a comma, period or exclamation mark. It would be better to use a word boundary in the lookahead: `/(\b[a-z](?!\b))/g;` – Guido Bouman May 03 '18 at 12:22
  • @GuidoBouman: Your suggested regex fails for Coeur D'Alene and O'Fallon though. – davemyron May 23 '19 at 00:56
20

A simple solution is to use word boundaries:

#\b[a-z0-9-_]+#i

Alternatively, you can match for just a few characters:

#([\s\-_]|^)([a-z0-9-_]+)#i
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • 2
    @Tim - I took artistic freedom and didn't change the way the OP matches letters - It's *possible* Simmer wants the letter as output, change their colors or whatnot. Also, didn't gave it that much thought, I only had 4 minutes `:P` – Kobi Jun 06 '11 at 14:35
  • 1
    Can someone please add jsfiddle example would be helpful – Pravin W Jun 09 '16 at 10:33
  • 1
    Which language's regex is this for? – JohnK Jun 22 '17 at 15:32
  • @JohnK - Both of these are simple enough and should work in all languages. `#` is a separator here, so your language may need `"\\b[a-z0-9-_]+"` and an `IgnoreCase` flag. – Kobi Jun 22 '17 at 15:44
11

If you want to use pure regular expressions you must use the \u.

To transform this string:

This Is A Test For-stackoverflow

into

This Is A Test For-Stackoverflow

You must put: (.+)-(.+) to capture the values before and after the "-" then to replace it you must put:

$1-\u$2

If it is in bash you must put:

echo "This Is A Test For-stackoverflow" | sed 's/\(.\)-\(.\)/\1-\u\2/'

Jaime Roman
  • 749
  • 1
  • 11
  • 26
  • This works amazingly, thank you. However, could you please link so documentation as to how \u flag actually works and how does it capitalize the first letter of the second group? – Harshal Feb 08 '23 at 07:08
  • 1
    @Harshal, I don't understand your question, but I'll try to answer. With the "\u" flag you are saying that you want to capitalize the first character of the string that follows. if you use the expression (.+)-(.+) the value $2 is what corresponds to what is inside the second parenthesis, that's why "u\$2" is capitalizing the first letter, in this example it is "S" by stackoverflow – Jaime Roman Feb 08 '23 at 07:31
  • Thank you, to be more precise, I was curious about any documentation which says, "\u" does the capitalisation of first letter. It works as expected, I just wanted to read more about this flag. – Harshal Feb 09 '23 at 06:35
8

Actually dont need to match full string just match the first non-uppercase letter like this:

'~\b([a-z])~'
anubhava
  • 761,203
  • 64
  • 569
  • 643
4

For JavaScript, here’s a solution that works across different languages and alphabets:

const originalString = "this is a test for-stackoverflow"
const processedString = originalString.replace(/(?:^|\s|[-"'([{])+\S/g, (c) => c.toUpperCase())

It matches any non-whitespace character \S that is preceded by a the start of the string ^, whitespace \s, or any of the characters -"'([{, and replaces it with its uppercase variant.

Michael Schmid
  • 4,601
  • 1
  • 22
  • 22
3

my solution using javascript

function capitalize(str) {
  var reg = /\b([a-zÁ-ú]{3,})/g;
  return string.replace(reg, (w) => w.charAt(0).toUpperCase() + w.slice(1));
}

with es6 + javascript

const capitalize = str => 
    str.replace(/\b([a-zÁ-ú]{3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));



/<expression-here>/g
  1. [a-zÁ-ú] here I consider all the letters of the alphabet, including capital letters and with accentuation. ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
  2. [a-zÁ-ú]{3,} so I'm going to remove some letters that are not big enough
    ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
  3. \b([a-zÁ-ú]{3,}) lastly i keep only words that complete which are selected. Have to use () to isolate the last expression to work.
    ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas

after achieving this, I apply the changes only to the words that are in lower case

string.charAt(0).toUpperCase() + w.slice(1); // output -> Output

joining the two

str.replace(/\b(([a-zÁ-ú]){3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));

result:
Sábado de Janeiro às 19h. Sexta-Feira de Janeiro às 21 e Horas

1

Python solution:

>>> import re
>>> the_string = 'this is a test for stack-overflow'
>>> re.sub(r'(((?<=\s)|^|-)[a-z])', lambda x: x.group().upper(), the_string)
'This Is A Test For Stack-Overflow'

read about the "positive lookbehind"

DᴀʀᴛʜVᴀᴅᴇʀ
  • 7,681
  • 17
  • 73
  • 127
nmz787
  • 1,960
  • 1
  • 21
  • 35
1

While this answer for a pure Regular Expression solution is accurate:

echo "This Is A Test For-stackoverflow" | sed 's/\(.\)-\(.\)/\1-\u\2/'

it should be noted when using any Case-Change Operators:

\l            Change case of only the first character to the right lower case. (Note: lowercase 'L')
\L            Change case of all text to the right to lowercase.
\u            Change case of only the first character to the right to uppercase.
\U            Change case of all text to the right to uppercase.

the end delimiter should be used:

\E

so the end result should be:

echo "This Is A Test For-stackoverflow" | sed 's/\(.\)-\(.\)/\1-\u\E\2/'
DᴀʀᴛʜVᴀᴅᴇʀ
  • 7,681
  • 17
  • 73
  • 127
-1

this will make

R.E.A.C De Boeremeakers

from

r.e.a.c de boeremeakers

(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])

using

    Dim matches As MatchCollection = Regex.Matches(inputText, "(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])")
    Dim outputText As New StringBuilder
    If matches(0).Index > 0 Then outputText.Append(inputText.Substring(0, matches(0).Index))
    index = matches(0).Index + matches(0).Length
    For Each Match As Match In matches
        Try
            outputText.Append(UCase(Match.Value))
            outputText.Append(inputText.Substring(Match.Index + 1, Match.NextMatch.Index - Match.Index - 1))
        Catch ex As Exception
            outputText.Append(inputText.Substring(Match.Index + 1, inputText.Length - Match.Index - 1))
        End Try
    Next
Sedecimdies
  • 152
  • 1
  • 10