So, the requirement for this is to match last names of people, separated by a dash between each last name.
The base RegEx I am using for this is this one:
(?=\S*[-])([a-zA-ZÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù'-]+)
Basically I am limiting it to latin alphabet characters, including some accented characters.
This works perfectly fine if I use examples like:
- Pérez-González
- Domínguez-Díaz
- Güemez-Martínez
But I forgot to contemplate the case when the person has only one last name.
I tried doing the following.
((?=\S*[-])([\ a-zA-ZÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù'-]+))|([A-Za-zÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù']+)
I added a \
or space in the allowed character for the fist match option. I added an or condition for a single word without spaces.
And while it works for some cases there are 2 issues.
- I don't think it's the most optimal RegEx for a use case like this.
- I stumbled upon the specific case with people who have complex last names.
Regarding point 2, I refer to something like:
- Johnson-De Sosa
The RegEx matches it, but it no longer respects the dash as a separator.
I am not sure how to handle this.
Also since I added the space it no longer respects the requirement for the dash between words.
What I am thinking is maybe limit the number of spaces between names, something like allow at most 2 or 3 spaces between a last name so that examples like:
- Pérez-De la Cruz - this works with my RegEx
- Pérez De la Cruz-González - this doesn't
Can be valid matches.
I am no pro on RegEx so some help would be greatly appreciated.
UPDATE
I did fail to mention I need to be able to use this with JavaScript. PHP could be useful too, but I am doing some browser validation and the patterns need to be compatible.