4

I'm trying to make an substring optional. Here is the source :

Movie TOTO S09 E22 2022 Copyright

I want to optionally capture the substring : S09 E22

What I have tried so far :

/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi

The problem is that it ends up by matching S09 E22 2022 Copyright instead of just S09 E22 :

Match 1 : 0-33  Movie TOTO S09 E22 2022 Copyright 
Group 1 : 0-5   Movie
Group 2:  5-33   TOTO S09 E22 2022 Copyright

Is there anyway to fix this issue ?

Regards

Prakein
  • 59
  • 4

4 Answers4

7

You get that match because the .* is greedy and will first match until the end of the string.

Then your (S\d\d\s*E\d\d)? is optional so this will stay matched and does not backtrack.

If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:

\b(Movie)\b\h+((?:(?!\h+[SE]\d+\b).)*)(?:\h(S\d+\h+E\d+))?
  • \b(Movie)\b Capture the word Movie
  • ( Capture group
    • (?: Non capture group to repeat as a whole part
      • (?!\h+[SE]\d+\b). Match any character if either the S01 or E22 part is not directly to the right (where [SE] matches either a S or E char, and \h matches a horizontal whitespace char)
    • )* Close the non capture group and optionall repeat it
  • ) Close capture group
  • (?:\h(S\d+\h+E\d+)) Optionally capture the S01 E22 part (where \d+ matches 1 or more digits)

Regex demo

Another option with a capture group for the S01 E22 part, or else match the redt of the line

\b(Movie)\h+([^S\n]*(?:S(?!\d+\h+E\d+\b)[^S\n]*)*+)(S\d+\h+E\d+)?

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
5

With your shown samples and attempts please try following regex.

^Movie\s+\S+\s+(S\d{2}\s+E\d{2}(?=\s+\d{4}))

Here is the Online Demo for used regex.

Explanation: Adding detailed explanation for used regex above.

^Movie\s+\S+\s+  ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s+E\d{2} ##Creating one and only capturing group where matching:
                 ##S followed by 2 digits followed by spaces followed by E and 2 digits.
  (?=\s+\d{4})   ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
)                ##Closing capturing group here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
5

An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist.

Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)

See this demo at regex101 (further I added some \s* spaces around captures)

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
3

There are several errors in your regex:

  • Blank space after Movie is not considered.
  • (.*) matches everything after Movie.

Try online at https://regex101.com/

(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)
Azhar Khan
  • 3,829
  • 11
  • 26
  • 32