I'm trying to use CFStringTokenizer with kCFStringTokenizerUnitSentence to split a string into sentences. The first problem I'm having is that sentences need to be capitalized in order for them to be recognized as sentences. If not, it just thinks it's part of the previous sentence.
I'm splitting user-entered text so I'm expecting the text to be very unclean.
Is there something else I can do with CFStringTokenizer to have it detect uncapitalized sentences? Or will I have to use another method of splitting altogether?
I followed the answer on this SO question for my implementation: How to get an array of sentences using CFStringTokenizer?
NOTE: After testing a bit more it seems that with kCFStringTokenizerUnitSentence, if a '!' or a '?' is followed by an uncapitalized sentence, it will recognize the sentence. Also, if one of those punctuation marks is followed by a sentence without a space between the '!' and the first word, it will still separate.
So the one case I need to work around is a '.' followed by an uncapitalized sentence.
ANOTHER OPTION I found, if you're getting the text from a textField, is to use this:
textField.autocapitalizationType = UITextAutocapitalizationTypeSentences;
It will automatically capitalize sentences so you don't have to worry about converting for CFStringTokenizer. It still doesn't account for edge cases like abbreviations, but at least in my case the user will have an option to delete the auto-capitalization if it's wrong.