0

I am building an email scraper in JAVA which need to scrape information from specific mails. Those mails have been send through a couple of years. I am facing the problem that every year there is a little change in the html code my code works fine for a specific year but won't for the following or previous year. I am looking for a way to write smart code. NEEDED is the value I need, and VARIABLE can be different. SAME TITLE is always the same.

<span class="confirmationtitle">SAME TITLE 1</span></td></tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED1 </span></td>
</tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED2</span></td>
</tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED3</span></td>
</tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED4</span></td>
</tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED5</span></td>
</tr>
<tr>
   <td><span class="confirmationleft">VARIABLE</span></td>
   <td><span class="confirmationright">NEEDED6</span></td>
</tr>

Above the code from year x, below from year y. There are multiple table rows like this with different info.

<tr>
   <div style="font-weight: bold; display: block; margin-top: 20px;">SAME TITLE 1</div>
   </td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED1 </span></td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED2</span></td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED3</span></td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED4</span></td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED5</span></td>
</tr>
<tr>
   <td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
   <td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED6</span></td>
</tr>
<tr>

My code for this specific row in the table:

        String[] SpecificInfo = new String[6];

        String TravellerInfoGender = SpecificInfo[0] = headerInfo.split("</span></td>")[1].split("</span>")[0].split(">")[2];
        String TravellerInfoFirstname = SpecificInfo[1] = headerInfo.split("</span></td>")[3].split("</span>")[0].split(">")[2];
        String TravellerInfoMiddleName = SpecificInfo[2] = headerInfo.split("</span></td>")[5].split("</span>")[0].split(">")[2];
        String TravellerInfoSurName = SpecificInfo[3] = headerInfo.split("</span></td>")[7].split("</span>")[0].split(">")[2];
        String TravellerInfoDateOfBirth = SpecificInfo[4] = headerInfo.split("</span></td>")[9].split("</span>")[0].split(">")[2];
        String TravellerInfoNationality = SpecificInfo[5] = headerInfo.split("</span></td>")[11].split("</span>")[0].split(">")[2];


        for(int i = 0; i < TravellerInfo.length ; i++)
            writeToFile(TravellerInfo[i]);

        return TravellerInfo;

Where headerInfo contains the html snippet as in the first two code examples.

I hope there is a way I do not have to hard code every little change.

Thanks!

wolteeer
  • 35
  • 9

0 Answers0