0

I have recently completed the following programming exercise: Acronym Generator

The statement is:

In nearly every company each employee has a certain acronym containing the first characters of his first and last name(s).

Your task is to write an acronym generator which generates an acronym for a given name. You don't have to care about duplicate acronyms (someone else will do this for you). Note that names can be given in upper or in lower case. The acronym shall always be upper case.

Normally the acronym is always the first letter of your first and the first letter of the last name in upper case.

For example:

Thomas Meyer => TM

martin schmidt => MS

In your company there work only people with a maxinmum of two first names. If a person has two first names, they might be joined with a dash.

Jan-Erich Schmidt => JES Jan Erich Mueller => JEM

Last names may also be joined with a dash. No one can have more than two last names.

Paul Meyer-Schmidt

In Germany, there are last names which have the leading word "von". This shall be abbreviated with a lower case "v":

Paul von Lahnstein => PvL

Martin von Lahnstein-Meyer => MvLM

I have completed the exercise and I am trying to understand other people's answers. I have found one which uses replaceAll and regex. You can see this solution in this link.

public class AcronymGenerator {
  public static String createAcronym(String lastName, String firstName) {
    firstName = firstName.toUpperCase().replaceAll("(.)([A-Z])*([-| ])?(.)?(.)*", "$1$4");
    String von = lastName.toLowerCase().replaceAll("^((v)(on ))?(.)*", "$2");
    lastName = lastName.toUpperCase().replaceAll("(VON )?(.)([A-Z])*([-| ])?(.)?(.)*", "$2$5");
    return firstName+von+lastName;
  }
}

I guess what he does is replace names by their initial in capital letters, von by the v, and surnames by their initial in capital letters. However, I do not understand how the groups of regular expressions work when used within replaceAll

Could you explain how replaceAll() works with regex groups? I would like to understand how it works:

replaceAll("(.)([A-Z])*([-| ])?(.)?(.)*", "$1$4");
replaceAll("^((v)(on ))?(.)*", "$2");
replaceAll("(VON )?(.)([A-Z])*([-| ])?(.)?(.)*", "$2$5");

I have also read: Java: Understanding the String replaceAll() method What is a non-capturing group in regular expressions? How to Extract people's last name start with "S" and first name not start with "S"

Yone
  • 2,064
  • 5
  • 25
  • 56
  • 1
    I’m just wondering how to split a full name correctly into first and last name before calling `createAcronym(String lastName, String firstName)`. Of course it can be done, it just doesn’t seem trivial under the rules given. – Ole V.V. Nov 16 '19 at 13:46
  • 1
    @OleV.V. It is very trivial if you ask the user to input first name in one field and last name in another field. – Andreas Nov 16 '19 at 14:28

1 Answers1

3

In regex, () parentheses are used to define a group. If the first character inside the parentheses is a ?, then it's a non-capturing group1, otherwise it's a capturing group.

1) Except that (?<name>X) is a named capturing group.

Examples: A(FOO)B will match the string "AFOOB" and will capture the string "FOO". A(?:FOO)B will match the string "AFOOB" and will not capture anything.

Since a regex can have more than one capture group, they are identified by position, starting with the first ( being capture group 1.

Example: A(X)B(?:Y)C(Z) will match the string "AXBYCZ" and will capture "X" as group 1, and "Z" as group 2. Non-capturing groups don't count.

So, in your example:

replaceAll("(.)([A-Z])*([-| ])?(.)?(.)*", "$1$4");
            ↑  ↑       ↑       ↑   ↑
            1  2       3       4   5

If a group is optional (directly as here, or inside a bigger optional group), then the captured value is null when retrieved by calling group(n) on the Matcher, or a blank string when referenced using the $n syntax in a replacement value (as shown here).

The regex above it actually capturing too much, and is incorrectly using | in a character class, and would be better written as:

replaceAll("(.)[A-Z]*[- ]?(.)?.*", "$1$2");
            ↑             ↑
            1             2
Andreas
  • 154,647
  • 11
  • 152
  • 247