-2

I have a bunch of strings such as:

Super Mario Bros. 8 (En,Fr,De,Es,It)
Donald Duck in Whacky Land (En,Fr,De,Es,Sv)
Toadstool Adventures 3D (En)
Chinaland (En,De)
A title which doesn't have any such thing
...

That is, a title of a product followed by (sometimes) a list of one or more language codes in parentheses.

I really struggle to come up with a (PCRE) regexp to safely remove these from the strings in a safe manner. That is, not likely to touch the titles.

I know that ([A-Z]{1}[a-z]{1}) must be involved somewhere, to match a single language code such as "It" or "De", but how I should handle the possibility of any number of such in a row, with commas between or no comma (if it's just one), is beyond my regular expression skills.

I really wish that they had used some kind of unambiguous separator between the title part and the "metadata" part of the filenames... Then I wouldn't need to do all this manual trial-and-error removal. But they didn't.

user379490
  • 159
  • 5

2 Answers2

0

Something like this would do it:

\([A-Z][a-z](?:,[A-Z][a-z])*\)$

https://regex101.com/r/xxNQ8h/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
0

Try it like this:

\(([A-Z][a-z],?)+\).*$

Online Demo

wp78de
  • 18,207
  • 7
  • 43
  • 71