I've done a bunch of searching but I'm terrible with regex statements and my google-fu in this instance as not been strong.
Scenario:
In push notifications, we're passed a URL that contains a 9-digit content ID.
Example URL: http://www.something.com/foo/bar/Some-title-Goes-here-123456789.html
(123456789 is the content ID in this scenario)
Current regex to parse the content ID:
public String getContentIdFromPathAndQueryString(String path, String queryString) {
String contentId = null;
if (StringUtils.isNonEmpty(path)) {
Pattern p = Pattern.compile("([\\d]{9})(?=.html)");
Matcher m = p.matcher(path);
if (m.find()) {
contentId = m.group();
} else if (StringUtils.isNonEmpty(queryString)) {
p = Pattern.compile("(?:contentId=)([\\d]{9})(?=.html)");
m = p.matcher(queryString);
if (m.find()) {
contentId = m.group();
}
}
}
Log.d(LOG_TAG, "Content id " + (contentId == null ? "not found" : (" found - " + contentId)));
if (StringUtils.isEmpty(contentId)) {
Answers.getInstance().logCustom(new CustomEvent("eid_url")
.putCustomAttribute("contentId", "empty")
.putCustomAttribute("path", path)
.putCustomAttribute("query", queryString));
}
return contentId;
}
The problem: This does the job but there's a specific error scenario that I need to account for.
Whoever creates the push may put in the wrong length content ID and we need to grab it regardless of that, so assume it can be any number of digits... the title can also contain digits, which is annoying. The content ID will ALWAYS be followed by ".html"