0

I am trying to convert HTML table to a 2d array with rowspans and colspans in Java. I found a solution with Java 8 stream API Extract data from complex HTML tables to 2d array in Java.

But I need a solution without a streamer. I almost got a solution but I sometimes get null in some of the cells.

code:

private static String[][] tableFix2(Elements trElements){
    String table[][] = new String[trElements.first().select("td").size()] 
    [trElements.size()];



    for (int tr = 0; tr < trElements.size(); tr++){
        Elements tdElements = trElements.get(tr).select("td");


        for (int td = 0; td < tdElements.size();td++) {
            Element tdEl = tdElements.get(td);
            String tdElString = tdEl.text();
            //System.out.println(tdElString);

            int colspan = tdEl.attr("colspan").equals("") ? 1 : Integer.parseInt(tdEl.attr("colspan"));
            int rowspan = tdEl.attr("rowspan").equals("") ? 1 : Integer.parseInt(tdEl.attr("rowspan"));

            if (colspan > 1 && rowspan <= 1) {

                for (int c = td; c < td + rowspan; c++) {
                    if (table[c][tr] == null)
                        table[c][tr] = tdElString;
                }


            } else if (rowspan > 1 && colspan <= 1) {

                for (int r = tr; r < tr + rowspan; r++) {
                    if (table[td][r] == null)
                        table[td][r] = tdElString;
                }


            } else if (rowspan > 1 && colspan > 1) {
                for (int r = tr; r < tr + rowspan; r++) {
                    for (int c = td; c < td + colspan; c++) {
                        if (table[c][r] == null)
                            table[c][r] = tdElString;
                    }
                }

            } else {
                if (table[td][tr] == null)
                    table[td][tr] = tdElString;
            }
        }

    }
    System.out.println(Arrays.deepToString(table));


    return table;
}

trElements is the input of this function, I used Jsoup to get all the tr elements of the table

My table:

enter image description here

The output:

[[A, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
[B, 8:25 - 7:40, 8:30-9:15, 9:20-10:05, 10:25-11:10, 11:15-12:00, 12:05-12:50, 13:10-13:55, 14:00-14:45, 14:50-15:35, 15:40-16:25], 
[C, , classes,classes, , , , , , , ], 
[D, homework,homework, , , , , , , , ], 
[E, , , , , , , , , , ], 
[F, , , , , , , , , , ], 
[G, , , , playing,playing, , , , , ], 
[H, , null, null, sleeping,sleeping, , , , , ]]

html code:

<table class="c6" dir="rtl">
<tbody>
<tr class="c23" style="height: 31px;">
<td class="c15" style="width: 19px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">A</span></p>
</td>
<td class="c12" style="width: 74px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">B</span></p>
</td>
<td class="c34" style="width: 68px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">C</span></p>
</td>
<td class="c13" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">D</span></p>
</td>
<td class="c27" style="width: 21px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">E</span></p>
</td>
<td class="c31" style="width: 21px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">F</span></p>
</td>
<td class="c41" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">G</span></p>
</td>
<td class="c41" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">H</span></p>
</td>
</tr>
<tr class="c16" style="height: 45px;">
<td class="c15" style="width: 19px; height: 45px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">1</span></p>
</td>
<td class="c12" style="width: 74px; height: 45px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">8:25 - 7:40</span></p>
</td>
<td class="c11" style="width: 68px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 91px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
homework</td>
<td class="c14" style="width: 21px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 46px;">
<td class="c15" style="width: 19px; height: 46px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">2</span></p>
</td>
<td class="c12" style="width: 74px; height: 46px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">8:30-9:15</span></p>
</td>
<td class="c11" style="width: 68px; height: 77px;" colspan="1" rowspan="2">
<p class="c3" dir="rtl">&nbsp;</p>
classes&nbsp;</td>
<td class="c14" style="width: 21px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 31px;">
<td class="c15" style="width: 19px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">3</span></p>
</td>
<td class="c33" style="width: 74px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">9:20-10:05</span></p>
</td>
<td class="c19" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c47" style="height: 51px;">
<td class="c36" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">4</span></p>
</td>
<td class="c29" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">10:25-11:10</span></p>
</td>
<td class="c44" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 102px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
playing</td>
<td class="c9" style="width: 88px; height: 102px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
sleeping</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">5</span></p>
</td>
<td class="c53" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">11:15-12:00</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c39" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">6</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">12:05-12:50</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">7</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">13:10-13:55</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">8</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">14:00-14:45</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">9</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">14:50-15:35</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">10</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">15:40-16:25</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
</tbody>
</table>

What is wrong here?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Napkin
  • 1
  • 3

1 Answers1

0

Just count the number of tds in 3rd and 4th row and you will get the answer. You are iterating till the tdlist.length only. Your input does not have enough columns for some rows.

gagan singh
  • 1,591
  • 1
  • 7
  • 12