First of all despite the fact that your regular expression had mistakes, hence the reason it could not compile, I must congratulate you for a valiant effort. It is difficult to get a regex 100% right from the beginning, even for cases that look innocuous and straightforward. With minor corrections you can modify it to extract the desired information from your strings, assuming that the delimiters are dots '.' as in your example and the season and episode are given in the exact SXXEXX
format. Here is the corrected version of the pattern: "\\.S(\\d{2})E(\\d{2})\\."
You can access the captured groups by calling m.group(1)
and m.group(2)
respectively for season and episode. Quoting from the java.util.regex.Matcher
javadoc:
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
In order to enhance the pedagogic paradigm, I have written a singleton (only one instance is possible) that has been engineered according to the Effective Java advice on p.17, (Bloch J., 2nd ed., 2008). The instance of the class, which is accessed with the getInstance()
method, exposes the parse()
method which takes a string containing the series information you seek to extract and parses it, saving the season and episode numbers to the respective private integers fields. Finally as a test we try to parse an array of challenging episode names from various (fictional) series - including your own example - and see if we can get the number of season and episode. IMHO this example illustrates in a succinct way not only a broader version of what you are trying to achieve, but also:
- an effective approach to using repeatedly a compiled pattern
- a less restrictive pattern than the one you were trying to match (e.g. "S", "s", "Season", "SEASON", "season" are all acceptable variants for matching the season keyword)
- how to use lookarounds and word boundaries
(?<=
and (?=
and \b
- how to use named capturing groups using the
(?<name>X)
syntax (caveat: must use Java 7 or later, see this older question for more information)
- interesting cases of how to use the
Pattern
and Matcher
classes respectively. You can also take a look in this very educational tutorial from Oracle The Java Tutorials: Regular Expressions
- how to create and use singletons
// Class begin
public class SeriesInfoMatcher {
private int season, episode;
static final String SEASON_EPISODE_PATTERN = "(?<=\\b|_) s(?:eason)? (?<season>\\d+) e(?:pisode)? (?<episode>\\d+) (?=\\b|_)";
private final Pattern pattern = Pattern.compile(SEASON_EPISODE_PATTERN, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
private Matcher matcher;
private String seriesInfoString;
private static SeriesInfoMatcher instance;
private SeriesInfoMatcher() {
resetFields();
}
public static SeriesInfoMatcher getInstance() {
return instance == null ? new SeriesInfoMatcher() : instance;
}
/**
* Analyzes a string containing series information and updates the internal fields accordingly
* @param unparsedSeriesInfo The string containing episode and season numbers to be extracted. Must not be null or empty.
*/
public void parse (String unparsedSeriesInfo) {
try {
if (unparsedSeriesInfo == null || unparsedSeriesInfo.isEmpty()) {
throw new IllegalArgumentException("String argument must be non-null and non-empty!");
}
seriesInfoString = unparsedSeriesInfo;
initMatcher();
while (matcher.find()) {
season = Integer.parseInt ( matcher.group("season") );
episode = Integer.parseInt( matcher.group("episode"));
}
}
catch (Exception ex) {
resetFields();
System.err.printf("Invalid movie info string format. Make sure there is a substring of \"%s\" format.%n%s", "S{NUMBER}E{NUMBER}", ex.getMessage());
}
}
private void initMatcher() {
if (matcher == null) {
matcher = pattern.matcher(seriesInfoString);
}
else {
matcher.reset(seriesInfoString);
}
}
private void resetFields() {
seriesInfoString = "";
season = -1;
episode = -1;
}
@Override
public String toString() {
return seriesInfoString.isEmpty() ?
"<no information to display>":
String.format("{\"%s\": %d, \"%s\": %d}", "season", season, "episode", episode);
}
public static void main(String[] args){
// Example movie info strings
String[] episodesFromVariousSeries = {
"Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew",
"Galactic Wars - S01E02 - A dire development",
"A.dotted.hell.season3episode15.when.enough.is.enough.XVID",
"The_underscore_menace_-_The_horror_at_the_end!_[2012]_s05e02",
"s05e01_-_The_underscore_menace_-_Terror_at_the_beginning_[2012]"
};
SeriesInfoMatcher seriesMatcher = new SeriesInfoMatcher();
System.out.printf( "%-80s %-20s%n", "Episode Info", "Parsing Results" );
for (String episode: episodesFromVariousSeries) {
seriesMatcher.parse(episode);
System.out.printf( "%-80s %-20s%n", episode, seriesMatcher );
}
}
}
The output of the main() is:
Episode Info Parsing Results
Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew {"season": 6, "episode": 1}
Galactic Wars - S01E02 - A dire development {"season": 1, "episode": 2}
A.dotted.hell.season3episode15.when.enough.is.enough.XVID {"season": 3, "episode": 15}
The_underscore_menace_-_The_horror_at_the_end!_[2012]_s05e02 {"season": 5, "episode": 2}
s05e01_-_The_underscore_menace_-_Terror_at_the_beginning_[2012] {"season": 5, "episode": 1}