5

I need to convert to title case the following:

  1. First word in a phrase;

  2. Other words, in the same phrase, which length is greater than minLength.

I was looking at ToTitleCase but the result is not the expected.

So the phrase "the car is very fast" with minLength = 2 would become "The Car is Very Fast".

I was able to make the first word uppercase using:

Char[] letters = source.ToCharArray();
letters[0] = Char.ToUpper(letters[0]);

And to get the words I was using:

Regex.Matches(source, @"\b(\w|['-])+\b"

But I am not sure how to put all this together

Thank You, Miguel

SethO
  • 2,703
  • 5
  • 28
  • 38
Miguel Moura
  • 36,732
  • 85
  • 259
  • 481

4 Answers4

6

Sample code:

string input = "i have the car which is very fast";
int minLength = 2;
string regexPattern = string.Format(@"^\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

UPDATE (for the cases where you have multiple sentences in single string).

string input = "i have the car which is very fast. me is slow.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|\.)\s*)\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

Output:

I Have The Car Which is Very Fast. Me is Slow.

You may wish to handle !, ? and other symbols, then you can use the following. You can add as many sentence terminating symbols as you wish.

string input = "i have the car which is very fast! me is slow.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|[.!?])\s*)\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

UPDATE (2) - convert e-marketing to E-Marketing (consider - as valid word symbol):

string input = "i have the car which is very fast! me is slow. it is very nice to learn e-marketing these days.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|[.!?])\s*)\w|\b\w(?=[-\w]{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());
Ulugbek Umirov
  • 12,719
  • 3
  • 23
  • 31
  • How to extend your example to all words with length higher than 2 and to first word and words after a dot even if their lenght is smaller then 2? – Miguel Moura May 25 '14 at 21:39
  • I tested and it seems to work fine with one exception. If I have a word like "e-marketing" I would like it to become "E-Marketing" and in this moment I get "e-Marketing". Is it possible to solve this with your example? Thank You – Miguel Moura May 25 '14 at 21:48
  • @MDMoura Added processing of `-`. – Ulugbek Umirov May 25 '14 at 21:52
  • @MDMoura, shouldn't be the correct title case "E-marketing" (similar to "E-commerce")? –  May 25 '14 at 21:56
  • 1
    @elgonzo There are no strict rules: http://laurenedwardssv.blogspot.com.tr/2010/03/e-commerce-e-commerce-or-not-so-easy-to.html – Ulugbek Umirov May 25 '14 at 21:56
1

English Title Case is extremely complicated. And it is not computable. Period.

The best you can get is a routine that changes all small words according to a list of your preferences. This will still be wrong for all verbal expressions. While an extended list of variants could capture many of these, some would still be impossible to decide without a semantical analysis. Two examples:

  • Running on/On Empty
  • Working on/On a Building

The latter gets indeed clear from the context; the former is not. There is a clear difference of meaning, but the computer can't decide which is right.

(Somtimes even humans can't. I asked about the first example here a StackExchnge forum and got no acceptable answer..)

Here is a list is replacements I like; but some four-letter words (no pun intended) are personal choices. Also some might argue that all types of numerics, like any, all, few should be capitalized.

This is anything but elegant, in fact it is an embarassement of sorts. But it works for me rather well, so I use it on a regular basis and have fed 100k+ titles through it..:

public string ETC(string title)
{  // english title capitalization
    if (title == null) return "";

    string s = title.Trim().Replace('`', '\'');      // change apo to tick mark

    TextInfo UsaTextInfo = new CultureInfo("en-US", false).TextInfo;
    s = UsaTextInfo.ToTitleCase(s);              // caps for all words

    // a list of exceptions one way or the other..
    s = s.Replace(" A ", " a ");
    s = s.Replace(" also ", " Also ");
    s = s.Replace(" An ", " an ");
    s = s.Replace(" And ", " and ");
    s = s.Replace(" as ", " As ");
    s = s.Replace(" At ", " at ");
    s = s.Replace(" be ", " Be ");
    s = s.Replace(" But ", " But ");
    s = s.Replace(" By ", " by ");
    s = s.Replace(" For ", " for ");
    s = s.Replace(" From ", " from ");
    s = s.Replace(" if ", " If ");
    s = s.Replace(" In ", " in ");
    s = s.Replace(" Into ", " into ");
    s = s.Replace(" he ", " He ");
    s = s.Replace(" has ", " Has ");
    s = s.Replace(" had ", " Had ");
    s = s.Replace(" is ", " Is ");
    s = s.Replace(" my ", " My ");
    s = s.Replace("   ", "  ");                // no triple spaces
    s = s.Replace("'N'", "'n'");          // Rock 'n' Roll
    s = s.Replace("'N'", "'n'");         // Rock 'n Roll
    s = s.Replace(" no ", " No ");
    s = s.Replace(" Nor ", " nor ");
    s = s.Replace(" Not ", " not ");
    s = s.Replace(" Of ", " of ");
    s = s.Replace(" Off ", " off ");
    s = s.Replace(" On ", " on ");
    s = s.Replace(" Onto ", " onto ");
    s = s.Replace(" Or ", " or ");
    s = s.Replace(" O'c ", " O'C ");
    s = s.Replace(" Over ", " over ");
    s = s.Replace(" so ", " So ");
    s = s.Replace(" To ", " to ");
    s = s.Replace(" that ", " That ");
    s = s.Replace(" this ", " This ");
    s = s.Replace(" thus ", " Thus ");
    s = s.Replace(" The ", " the ");
    s = s.Replace(" Too ", " too ");
    s = s.Replace(" when ", " When ");
    s = s.Replace(" With ", " with ");
    s = s.Replace(" Up ", " up ");
    s = s.Replace(" Yet ", " yet ");
    // a few(!) verbal expressions
    s = s.Replace(" Get up ", " Get Up ");
    s = s.Replace(" Give up ", " Give Up ");
    s = s.Replace(" Givin' up ", " Givin' Up ");
    s = s.Replace(" Grow up ", " Grow Up ");
    s = s.Replace(" Hung up ", " Hung Up ");
    s = s.Replace(" Make up ", " Make Up ");
    s = s.Replace(" Wake Me up ", " Wake Me Up ");
    s = s.Replace(" Mixed up ", " Mixed Up ");
    s = s.Replace(" Shut up ", " Shut Up ");
    s = s.Replace(" Stand up ", " Stand Up ");            
    s = s.Replace(" Wind up ", " Wind Up ");
    s = s.Replace(" Wake up ", " Wake Up ");
    s = s.Replace(" Come up ", " Come Up ");
    s = s.Replace(" Working on ", " Working On ");
    s = s.Replace(" Waiting on ", " Waiting On ");
    s = s.Replace(" Turn on ", " Turn On ");
    s = s.Replace(" Move on ", " Move On ");
    s = s.Replace(" Keep on ", " Keep On ");
    s = s.Replace(" Bring It on ", " Bring It On ");
    s = s.Replace(" Hold on ", " Hold On ");
    s = s.Replace(" Hang on ", " Hang On ");
    s = s.Replace(" Go on ", " Go On ");
    s = s.Replace(" Coming on ", " Coming On ");
    s = s.Replace(" Come on ", " Come On ");
    s = s.Replace(" Call on ", " Call On ");
    s = s.Replace(" Trust in ", " Trust In ");
    s = s.Replace(" Fell in ", " Fell In ");
    s = s.Replace(" Falling in ", " Falling In ");
    s = s.Replace(" Fall in ", " Fall In ");
    s = s.Replace(" Faith in ", " Faith In ");
    s = s.Replace(" Come in ", " Come In ");
    s = s.Replace(" Believe in ", " Believe In ");



    return s.Trim();
}

Note that there are still quite a few rules that can't be implemented like this.

Some basic rules are not so hard: Capitalize the 1st and last word. All verbs (Is), adjectives (Red), promouns (He), nouns (Ace) and numbers (One), even if they have less than 3 (or 4) letters.

But the Exceptions are hard, e.g.: Don't capitalize prepositions when they are part or a verbal expression...

Example 1: 'Working on/On a Building' - You have to know that it is a gospel song to decide that it is 'On'.

Example 2: 'Running On/on Empty'. Could mean 'Running On' or 'Running (with gas indictor) 'on Empty'.

So in the end you will have to live with a compromise.

TaW
  • 53,122
  • 8
  • 69
  • 111
0

One alternative (and naive) solution that does not require a regular expression would be to use the String.Split method and a List.Select function to map the complex condition:

var text = @"i have the car which is very fast. me is slow.";
var length = 2;
var first = true; // first word in the sentence
var containsDot = false; // previous word contains a dot
var result = text
                .Split(' ')
                .ToList()
                .Select (p => 
                    {
                        if (first)
                        {
                            p = FirstCharToUpper(p);
                            first = false;
                        }
                        if (containsDot)
                        {
                            p = FirstCharToUpper(p);
                            containsDot = false;
                        }
                        containsDot = p.Contains(".");
                        if (p.Length > length)
                        {
                            return FirstCharToUpper(p);
                        }
                        return p;
                    })
                .Aggregate ((h, t) => h + " " + t);
Console.WriteLine(result);

The output is:

I Have The Car Which is Very Fast. Me is Slow.

The FirstCharToUpper method is from this SO post:

public static string FirstCharToUpper(string input)
{
    if (String.IsNullOrEmpty(input))
        throw new ArgumentException("ARGH!");
    return input.First().ToString().ToUpper() + String.Join("", input.Skip(1));
}

The drawback of this solution: the more complex the condition, the more complex / unreadable would be the select statement, but it is an alternative to regex.

Community
  • 1
  • 1
keenthinker
  • 7,645
  • 2
  • 35
  • 45
0

Here is an approach which uses a StringBuilder and pure string methods and does not need to use regex, so it should be quite efficient:

public static string ToTitleCase(string input, int minLength = 0)
{
    TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
    string titleCaseDefault = ti.ToTitleCase(input);
    if (minLength == 0)
        return titleCaseDefault;
    StringBuilder sb = new StringBuilder(titleCaseDefault.Length);
    int wordCount = 0;
    char[] wordSeparatorChars = " \t\n.,;-:".ToCharArray();

    for (int i = 0; i < titleCaseDefault.Length; i++)
    {
        char c = titleCaseDefault[i];
        bool nonSpace = !char.IsWhiteSpace(c);
        if (nonSpace)
        {
            wordCount++;
            int firstSpace = titleCaseDefault.IndexOfAny(wordSeparatorChars, i);
            int endIndex = firstSpace >= 0 ? firstSpace : titleCaseDefault.Length;
            string word = titleCaseDefault.Substring(i, endIndex - i);
            if (wordCount == 1) // first word upper
                sb.Append(word);
            else
                sb.Append(word.Length < minLength ? word.ToLower() : ti.ToTitleCase(word));
            i = endIndex - 1;
        }
        else
            sb.Append(c);
    }
    return sb.ToString();
}

You sample data:

string text =  "the car is very fast";
string output = ToTitleCase(text, 3);
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939