I am building an email scraper in JAVA which need to scrape information from specific mails. Those mails have been send through a couple of years. I am facing the problem that every year there is a little change in the html code my code works fine for a specific year but won't for the following or previous year. I am looking for a way to write smart code. NEEDED is the value I need, and VARIABLE can be different. SAME TITLE is always the same.
<span class="confirmationtitle">SAME TITLE 1</span></td></tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED1 </span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED2</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED3</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED4</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED5</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED6</span></td>
</tr>
Above the code from year x, below from year y. There are multiple table rows like this with different info.
<tr>
<div style="font-weight: bold; display: block; margin-top: 20px;">SAME TITLE 1</div>
</td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED1 </span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED2</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED3</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED4</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED5</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED6</span></td>
</tr>
<tr>
My code for this specific row in the table:
String[] SpecificInfo = new String[6];
String TravellerInfoGender = SpecificInfo[0] = headerInfo.split("</span></td>")[1].split("</span>")[0].split(">")[2];
String TravellerInfoFirstname = SpecificInfo[1] = headerInfo.split("</span></td>")[3].split("</span>")[0].split(">")[2];
String TravellerInfoMiddleName = SpecificInfo[2] = headerInfo.split("</span></td>")[5].split("</span>")[0].split(">")[2];
String TravellerInfoSurName = SpecificInfo[3] = headerInfo.split("</span></td>")[7].split("</span>")[0].split(">")[2];
String TravellerInfoDateOfBirth = SpecificInfo[4] = headerInfo.split("</span></td>")[9].split("</span>")[0].split(">")[2];
String TravellerInfoNationality = SpecificInfo[5] = headerInfo.split("</span></td>")[11].split("</span>")[0].split(">")[2];
for(int i = 0; i < TravellerInfo.length ; i++)
writeToFile(TravellerInfo[i]);
return TravellerInfo;
Where headerInfo contains the html snippet as in the first two code examples.
I hope there is a way I do not have to hard code every little change.
Thanks!