2nd EDIT ALL CAPS / UPPERCASE solution:
Finally got this simpler way from great other helping solutions here and here:
=trim(regexreplace(regexreplace(C15,"(?:([A-Z]{2,}))|.", " $1"), "(\s)([A-Z])","$1 $2"))
From this input:
isn'ter JOHN isn'tar DOE isn'ta or JANE
It returns this output:
JOHN DOE JANE
The Same For Title Case (Extracting All Capitalized / With 1st Letter As Uppercase Words :
Formula:
=trim(regexreplace(regexreplace(C1,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2"))
Input in C1
:
The friendly Quick Brown Fox from the woods Jumps Over the Lazy Dog from the farm.
Output in A1
:
The Quick Brown Fox Jumps Over Lazy Dog
Previous less efficient trials :
I had to custom tailor it that way for my use case:
= ArrayFormula(IF(REGEXMATCH(REGEXREPLACE(N3: N,
"(^[A-Z]).+(,).+(\s[a-z]\s)|(^[A-Z][a-z]).+(\s[a-z][a-z]\s)|(^[A-Z]\s).+(\.\s[A-Z][a-z][a-z]\s)|[A-Z][a-z].+[0-9]|[A-Z][a-z].+[0-9]+|(^[A-Z]).+(\s[A-Z]$)|(^[A-Z]).+(\s[A-Z][a-z]).+(\s[A-Z])|(\s[A-Z][a-z]).+(\s[A-Z]\s).+(\s[A-Z])|(^[A-Z][a-z]).+(\s[A-Z]$)|(\s[A-Z]\s).+(\s[A-Z]\s)|(\s[A-Z]\s)|^[A-Z].+\s[A-Z]((\?)|(\!)|(\.)|(\.\.\.))|^[A-Z]'|^[A-Z]\s|\s[A-Z]'|[A-Z][a-z]|[a-z]{1,}|(^.+\s[A-Z]$)|(\.)|(-)|(--)|(\?)|(\!)|(,)|(\.\.\.)|(\()|(\))|(\')|("
")|(“)|(”)|(«)|(»)|(‘)|(’)|(<)|(>)|(\{)|(\})|(\[)|(\])|(;)|(:)|(@)|(#)|(\*)|(¦)|(\+)|(%)|(¬)|(&)|(|)|(¢)|($)|(£)|(`)|(^)|(€)|[0-9]|[0-9]+",
""), "[A-Z]{2,}") = FALSE, "", REGEXREPLACE(N3: N,
"(^[A-Z]).+(,).+(\s[a-z]\s)|(^[A-Z][a-z]).+(\s[a-z][a-z]\s)|(^[A-Z]\s).+(\.\s[A-Z][a-z][a-z]\s)|[A-Z][a-z].+[0-9]|[A-Z][a-z].+[0-9]+|(^[A-Z]).+(\s[A-Z]$)|(^[A-Z]).+(\s[A-Z][a-z]).+(\s[A-Z])|(\s[A-Z][a-z]).+(\s[A-Z]\s).+(\s[A-Z])|(^[A-Z][a-z]).+(\s[A-Z]$)|(\s[A-Z]\s).+(\s[A-Z]\s)|(\s[A-Z]\s)|^[A-Z].+\s[A-Z]((\?)|(\!)|(\.)|(\.\.\.))|^[A-Z]'|^[A-Z]\s|\s[A-Z]'|[A-Z][a-z]|[a-z]{1,}|(^.+\s[A-Z]$)|(\.)|(-)|(--)|(\?)|(\!)|(,)|(\.\.\.)|(\()|(\))|(\')|("
")|(“)|(”)|(«)|(»)|(‘)|(’)|(<)|(>)|(\{)|(\})|(\[)|(\])|(;)|(:)|(@)|(#)|(\*)|(¦)|(\+)|(%)|(¬)|(&)|(|)|(¢)|($)|(£)|(`)|(^)|(€)|[0-9]|[0-9]+",
"")))
Going one by one over all exceptions and adding their respective regex formulations to the front of the multiple pipes separated regexes in the regexextract function.
@Wiktor Stribiżew any simplifying suggestions would be very welcome.
Found some missing and fixed them.
1st EDIT:
A simpler version though still quite lengthy:
= ArrayFormula(IF(REGEXMATCH(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(
REGEXREPLACE(REGEXREPLACE(P3: P, "[a-z,]",
" "), "-|\.", " "), "(^[A-Z]\s)", " "
), "(\s[A-Z]\s)", " "),
"\sI'|\sI\s|^I'|^I\s|\sI(\.|\?|\!)|\sI$|\sA\s|^A\s|\.\.\.|\.|-|--|,|\?|\!|\.|\(|\)|'|"
"|:|;|\'|“|”|«|»|‘|’|<|>|\{|\}|\[|\]|@|#|\*|¦|\+|%|¬|&|\||¢|$|£|`|^|€|[0-9]|[0-9]+",
" "), "[A-Z]{2,}") = FALSE, " ", REGEXREPLACE(
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(
P3: P, "[a-z,]", " "), "-|\.", " "),
"(^[A-Z]\s)", " "), "(\s[A-Z]\s)", " "),
"\sI'|\sI\s|^I'|^I\s|\sI(\.|\?|\!)|\sI$|\sA\s|^A\s|\.\.\.|\.|-|--|,|\?|\!|\.|\(|\)|'|"
"|:|;|\'|“|”|«|»|‘|’|<|>|\{|\}|\[|\]|@|#|\*|¦|\+|%|¬|&|\||¢|$|£|`|^|€|[0-9]|[0-9]+",
" ")))
From this example:
Multiple regex matches in Google Sheets formula