I am making an Android application that can fetch the new announcements from the website of my university.
This is the HTML code in the website:
sample_html_code http://img690.imageshack.us/img690/1079/88210050.png
Text version:
<table border="1" width="90%" class="duyuru">
<tbody>
<tr>
<td>
<h3 class="duyuru">Additional Quotas for the Technical Electives</h3>
"19/09/2012"
<h4 class="duyuru">"Additional Quotas for Technical Electives offered in...</h4>
<span class="duyuru"></span>
<br>
<a href="news_image/96.doc">Download</a>
</td>
</tr>
</tbody>
</table>
I can get the first and third lines "Additional Quotas for Technical Electives" and "Additional Quotas for ..." by using the piece of code below. However, I cannot get the date information (19/09/2012) located between h3 and h4 lines.
String patternStr ="\\<h3 class=\"duyuru\".*?\\>(.*?)\\</h3\\>";
patternStr+="(.*?)"; // This line is problematic
patternStr+=".*?\\<h4 class=\"duyuru\".*?\\>(.*?)\\</h4\\>";
Pattern pattern = Pattern.compile(patternStr, Pattern.DOTALL);
Matcher matcher = pattern.matcher(content);
String name = "";
String date = "";
String details = "";
while (matcher.find()){
name = matcher.group(1);
date = matcher.group(2);
details = matcher.group(3);
Announcement announcement = new Announcement();
announcement.setName(name);
announcement.setDate(date);
announcement.setDetails(details);
announcements.add(announcement);
}
I tried using
.*?\"(.*?)\"
but it didn't work. When I do this, it gets the string "duyuru" from the line starting with h4 tag instead of the date information.
Anyone have an idea how can I grab the date information?
Thanks in advance.