1

I'm thinking this should be basic but having tried a number of things I'm nowhere nearer a solution:

I have a list of names and want to extract the initials of the names in the next column:

Name (have) Initials (want)
John Wayne JW
Cindy Crawford CC
Björn Borg BB
Alexandria Ocasio-Cortez AOC
Björk B
Mesut Özil

Note that some of these have non-English letters and they may also include hyphens. Using REGEXMATCH() I've been able to extract the first initial but that's where it stops working for me. For example this should work according to regex101:

=REGEXEXTRACT(AH2, "\b[A-Z]+(?:\s+[A-Z]+)*") but only yields the first letter.

General Grievance
  • 4,555
  • 31
  • 31
  • 45
a-burge
  • 1,535
  • 1
  • 13
  • 25
  • 2
    You can use REGEXREPLACE to replace any non-capital letters with empty string and that will leave you with your desired string. Try using this `=REGEXREPLACE(A1,"[^A-ZÀ-Ö]","")` and you can include additional character sets as per your needs. – Pushpesh Kumar Rajwanshi Aug 24 '22 at 16:57
  • 1
    Thanks @PushpeshKumarRajwanshi, this works phenomenally. I just added one last block at the end to cover a few very uncommon letters from mostly nordic languages `=REGEXREPLACE(O8,"[^A-ZÀ-ÖØ-Þ]","")`. Please consider adding this as an answer for me to accept ;) – a-burge Aug 26 '22 at 10:40

3 Answers3

1

-edit for anyone showing up here-

Please have a look at the comment to the question. It has the best answer to this problem, by using Regexreplace() rather than Regexextract() to target anything that is not an upper case character and applying the relevant unicode pattern.

-/ edit -

Finally realised that RE2 (the flavour of regex used by Google sheets) doesn't support the global modifier and only returns the first match (explained here). The accepted solution to that question finds a smart workaround which allowed me to solve the problem:

=join("",REGEXEXTRACT(AH2,REGEXREPLACE(AH2, "([A-Z]+(?:\s+[A-Z]+)*)","($1)")))

When considering the non-English characters, I got lost in a rabbit hole here and here and decided that I suddenly don't care that much about international characters.

a-burge
  • 1,535
  • 1
  • 13
  • 25
1

You can use REGEXREPLACE to replace any non-capital letters with empty string and that will leave you with your desired string.

Try using this =REGEXREPLACE(A1,"[^A-ZÀ-Ö]","") and you can include additional character sets as per your needs that you want to retain.

Here is a working screenshot.

enter image description here

Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
0

try:

=INDEX(REGEXREPLACE(A1:A, "[a-z ö-]", ))

enter image description here

player0
  • 124,011
  • 12
  • 67
  • 124
  • Thank you! Although this still misses other unconventional characters, such as french accents (á, é), cedilla (ş, ç), and ß. – a-burge Aug 26 '22 at 10:33
  • Really like the trick you did by using `Index()` to get the array btw. Do you have any documentation on that? – a-burge Aug 26 '22 at 10:34
  • @a-burge you can add them `[a-z öşçáéß-]` and nope for the documentation - as usually, no one cared to document it. maybe see: https://stackoverflow.com/q/73441747/5632629 – player0 Aug 26 '22 at 13:40