1

I am trying to figure out if there are existing libraries/methods that would allow you to capitalize each word in a string, however the words have no spaces. For example:

"blueparrot" should be converted to "BlueParrot"

I know that if there were spaces, I can accomplish this with:

var textInfo = new CultureInfo("en-US", false).TextInfo;
textInfo.ToTitleCase("blue parrot");

However, without the spacing, the outcome is

Blueparrot

Bagzli
  • 6,254
  • 17
  • 80
  • 163
  • This would rather difficult, though, wouldn't it? How would you handle compound words? – Kenneth K. Feb 18 '16 at 16:20
  • 1
    You need a dictionary of every existing word – Tim Schmelter Feb 18 '16 at 16:21
  • I can think of a way where it goes through each letter and starts combining words, where it finds them in a dictionary. Yes it would not be perfect, but I think possible. Take the longest words it can combine for example. Anyhow, I thought I would ask and see if there was a genius out there that figured it out and wrote a method. – Bagzli Feb 18 '16 at 16:21
  • @KennethK. -- I'm thinking more slow than difficult, and certainly there would be ambiguities. If you've got access to a dictionary table and an `IsWord()` method, you can bite off characters one at a time until you've got a word. Wash, rinse and repeat. – Bob Kaufman Feb 18 '16 at 16:22
  • 4
    Good luck with a dictionary, and biting off characters until you get a word, when the words is `Totaleclipse` you'll end up with `ToTaleClipSe` – Jamiec Feb 18 '16 at 16:27
  • [This post](http://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database) may inspire. – Bob Kaufman Feb 18 '16 at 16:28
  • Depending on how you implement it, sure it could be slow. But my understanding of why NLP is so difficult is that a computer cannot decipher the meaning of words and sentences. If I write `In the mean time it takes to read this sentence...`, how does the computer determine whether or not I meant "mean time" or "meantime"? In a true NLP you'd have more information, I believe, but in this question you won't have enough information to make the determination. It'll be a crude manipulation, but the OP seems OK with that. – Kenneth K. Feb 18 '16 at 16:29
  • @Jamiec - excellent example! Using a [stopword list](http://www.ranks.nl/stopwords) might help in those cases. – Bob Kaufman Feb 18 '16 at 16:29
  • @BobKaufman - I note that neither "blue" nor "Parrot" is on that list – Jamiec Feb 18 '16 at 16:33
  • Along the lines of @Jamiec's first comment, words starting with S will often be a problem because it would be hard for the algorithm to know whether the previous word was plural or if the S is part of the next word. – Steve Barron Feb 18 '16 at 16:40

1 Answers1

2

There is no easy solution to this except to say that something puts that string of 2 words together to start with, it is at that point that you need to either provide the information on where the word break is, or format it appropriately.

Jamiec
  • 133,658
  • 13
  • 134
  • 193