I'm looking to pull out a specific HTML a tag from some HTML that contains a specific date.
The HTML supplied to this in the unit test is:
Here is the Unit Test in question:
public void testParseBasePage(){
defenseGovContractsParser a = new defenseGovContractsParser("060613");
String expected = "http://www.defense.gov/contracts/contract.aspx?contractid=5059";
String result = a.parseBasePage("<td><a id=\"ctl00_ContentPlaceHolder_Body_ContractSummary_dgPRItems_ctl02_lnkPressItem\" title=\"Click for Contracts for June 06, 2013\" class=\"Link12\" href=\"http://www.defense.gov/contracts/contract.aspx?contractid=5059\">Contracts for June 06, 2013</a><span id=\"ctl00_ContentPlaceHolder_Body_ContractSummary_dgPRItems_ctl02_lblSubTitle\" class=\"MoreNews3a\"></span></td>");
assertEquals(expected,result);
}
Here's the code in question:
public String parseBasePage(String HTML) {
String contractUrl;
String yr = date.substring(4, 6);
String day = date.substring(2, 4);
String month = getMonthForInt(Integer.parseInt(date.substring(0, 2)));
Pattern getLink = Pattern.compile("<a.*?" + month + ".*?" + day + ".*?20" + yr + ".*?>");
Matcher match = getLink.matcher(HTML);
String link = match.group();
contractUrl = link.substring(link.indexOf("href") + 6);
contractUrl = contractUrl.replaceFirst("\">", "");
return contractUrl;
}
private String getMonthForInt(int m) {
String month = "invalid";
m = m - 1;
DateFormatSymbols dfs = new DateFormatSymbols();
String[] months = dfs.getMonths();
if (m >= 0 && m <= 11) {
month = months[m];
}
return month;
}
The resulting regex is:
<a.*?June.*?06.*?2013.*?>
which, when I use any online regex tester, matches as expected