1

I have the title and I want to see if it's a Director's Cut title. I'm doing that in the following way:

VERSIONS = {
    "DIRECTOR'S CUT": ["director's cut", "directors cut", "director’s cut", "versão do diretor", "director's edition", "montaje del director", "director corte", 
                       "version du réalisateur", "directors' cut", "director edition", "dictator's cut", "ディレクターズカット", "director´s cut", "Режиссерская версия", "감독판"],
}

title = title.lower()

is_director = False
for term in VERSIONS["DIRECTOR'S CUT"]:
    if term in name:
        is_director = True; break

This works fine. However, I'd like to add more version types as well as add many more patterns (in different languages) for the different version type. Is there a more performant way to do that than the for loop for each version I want to check? Given 1 million names and 1000 terms for various versions, I'm afraid this may become a bottleneck if I'm doing a forloop for each term.

  • You could try turning your array into a regular expression, e.g. `director['’]?s (cut|edition)|versão do diretor` – Barmar Jan 05 '19 at 00:13
  • @Barmar I have a lot of odd characters in there, it would still be fine? Spaces, apostrophes, commas, parentheticals, foreign characters, etc. –  Jan 05 '19 at 00:17
  • If you really have thousands of terms, trying to merge them like that will be impractical. Just create a big regexp as shown in the linked question. The regexp engine should optimize it. – Barmar Jan 05 '19 at 00:19
  • 1
    It seems like you're trying to do natural language processing. There are probably better ways to do this than making enormous lists of phrases. – Barmar Jan 05 '19 at 00:20

0 Answers0