This is a really wonderful problem and a shame that it isn't easier to do in Swift today (someday it will be, but not today).
I kind of hate this code, but I'm getting on a plane for 20 hours, and don't have time to make it nicer. This may at least get you started using NSMutableString
. It'd be nice to work in String
, and Swift hates regular expressions, so this is kind of hideous, but at least it's a start.
import Foundation
let input = "Hello, World ... I 'm a newbie iOS Developer."
let adjustments = [
(pattern: "\\s*(\\.\\.\\.|\\.|,)\\s*", replacement: "$1 "), // elipsis or period or comma has trailing space
(pattern: "\\s*'\\s*", replacement: "'"), // apostrophe has no extra space
(pattern: "^\\s+|\\s+$", replacement: ""), // remove leading or trailing space
]
let mutableString = NSMutableString(string: input)
for (pattern, replacement) in adjustments {
let re = try! NSRegularExpression(pattern: pattern)
re.replaceMatches(in: mutableString,
options: [],
range: NSRange(location: 0, length: mutableString.length),
withTemplate: replacement)
}
mutableString // "Hello, World... I'm a newbie iOS Developer."
Regular expressions can be very confusing when you first encounter them. A few hints at reading these:
The specific language Foundation uses is described by ICU.
Backslash (\) means "the next character is special" for a regex. But inside a Swift string, backslash means "the next character is special" of the string. So you have to double them all.
\s means "a whitespace character"
\s* means "zero or more whitespace characters"
\s+ means "one or more whitespace characters"
$1 means "the thing we matched in parentheses"
| means "or"
^ means "start of string"
$ means "end of string"
. means "any character" so to mean "an actual dot" you have to type "\\." in a Swift string.
Notice that I check for both "..." and "." in the same regular expression. You kind of have to do something like that, or else the "." will match three times inside the "...". Another approach would be to first replace "..." with "…" (the single ellipsis character, typed on a Mac by pressing Opt-;). Then "…" is a one-character punctuation. (You could also decide to re-expand all ellipsis back to dot-dot-dot at the end of the process.)
Something like this is probably how I'd do it in real life, get it done and shipped, but it may be worth the pain/practice to try to build this as a character-by-character state machine, walking one character at a time, and keeping track of your current state.