0

I have the following HTML extracted from a website. I have all of this HTML stored as a String variable in Java and I want to be able to look at every Table Row and if there are any data cells with the words "Current Assignments Report" in that table then it would look at the other data cells in that table and add the course name to an ArrayList and also store the numbers in the href after the javascript:rlViewItm and add those numbers to another ArrayList. Here is an example of that line:

<a href="javascript:rlViewItm('2049144736880355316');">View</a>

I will provide an example to clear up what I'm trying to get. It would first begin looking the html below which is a String. It would look at each Table and then each individual table row separately. If there is a table row which has a table data cell that says "Current Assignment Report" then it would look at the other data cells in that table row and find the line written below with only the numbers being changed. I want these numbers to be stored in a separate arrayList.

<a href="javascript:rlViewItm('2049145027227690148');">View</a>

I have worked with sorting strings in Java before but I don't understand how to store each thing separately into an ArrayList based on particular criteria of an HTML Table.

I would greatly appreciate anyone's help who can do this in Java!

  <div class="ed-formArea">
  <div class="ed-formHeader noText">
  </div>
  <div class="ed-formContent">
<!--SECTION CODE null Section #1  ENDS - DO NOT MODIFY -->
<!--SECTION CODE null CUSTOM CODE BEGIN -->


<form method="post" name="resourceLabelForm" action="/post/UserDocList.page">
<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td class="ed-tdEnd">
            Private Reports


                <small><small>&nbsp;(1-40 of 40&nbsp;items)</small></small>

        </td></tr>
</tbody>
</table>

 </form>

<form method="post" name="userDocListTableForm" action="/post/UserDocList.page">
  <input type="hidden" name="selectAllEvent" value="" />
  <input type="hidden" name="deselectAllEvent" value="" />
  <table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>


</tbody>
</table>



<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td>&nbsp;</td><td valign="bottom" width="12%">
          <div class="smaller"><strong>
            Report Date
          </strong></div>
        </td><td valign="bottom" width="8%">
          <div class="smaller"><strong>Report</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>View Home Page</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>Report Name</strong></div>
        </td><td valign="bottom" width="2%" class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027192329860');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027227690148');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027213095124');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027201539636');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/10/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027226480084');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027229871460');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027216196756');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049144831908197844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School"> 
      Local High School
      </a> 
    </td><td>

            Student Grades and Graduation Credit Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/07/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027196480420');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/02/14
        </td><td>
          <a href="javascript:rlViewItm('2049144736912474660');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Current Absences Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031942836');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031809620');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025439028');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936016776612');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936060013524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025100916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936022815204');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936043227972');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049145025811761220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 3 Absences as of Mar 31, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div>/td>
<td valign="center">&nbsp;</td><td>
          03/08/14
        </td><td>
          <a href="javascript:rlViewItm('2049144992192941348');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP3 as of Feb 28
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144934670566308');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 2 Absences as of Jan 24, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824058685812');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824085227764');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824074464628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824082665540');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824049900244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824039718948');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824065741444');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824083064244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          12/13/13
        </td><td>
          <a href="javascript:rlViewItm('2049144874776524020');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP2 as of Dec 06
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144822701443172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 1 Absences as of Nov 04, 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736860489172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736881890916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736862291156');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736866166628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736903239140');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736880355316');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736894413524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736870593220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          10/04/13
        </td><td>
          <a href="javascript:rlViewItm('2049144777895089844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP1 as of Sep 27
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    </tbody>
</table>
Vineet Shah
  • 839
  • 2
  • 7
  • 11

1 Answers1

0

Disclaimer: Do not use regular expressions to parse HTML.


If the HTML is as strictly formatted as in your posted code, you can follow these steps:

Using the Pattern.DOTALL flag, search the entire string with

<tr>(.*?)<td> Current Assignments Report </td>.*?</tr>

Iterating each match with Matcher.find(), puts each assignment's data into capture group one. Example match:

 <td>
  <div class="ed-tdSpacer"></div></td>
 <td valign="center">&nbsp;</td>
 <td> 04/02/14 </td>
 <td> <a href="javascript:rlViewItm('2049145027229871460');">View</a> </td>
 <td> <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a> </td>

In this text, search for each instance of <td> (.*?) </td>. The contents of each data item is placed in its capture group one. Searching the above text results in these matches:

04/02/14
<a href="javascript:rlViewItm('2049145027229871460');">View</a>
<a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a>

The date can be pretty much taken as is, and the other two items will need to be parsed based on what you want to get out of them.

But again, if your input is really as strict as you imply, it shouldn't be that bad.


Updated: With your most recent input (the long file you posted), this regex captures each item, as best I understand your needs:

<td>\s*?Current Assignments Report.*?<td>\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?<a href="javascript:rlViewItm\('([0-9]+)'\);">View</a>.*?<a class="lochomepage" href="([^"]+)">\s*([\w ]+)\s*</a>

Regular expression visualization

Debuggex Demo

Note this takes a while to load, because the input is so long.

Capture groups:

  1. Date
  2. Item number
  3. The lochomepage url
  4. The link display

I know it's been a while since you asked this. Maybe it still helps...

Community
  • 1
  • 1
aliteralmind
  • 19,847
  • 17
  • 77
  • 108
  • Thank you so much for the response however when I tried that I wasn't able to successfully access all the table rows with the Item #'s only a few of them popped up, any idea why that would happen? – Vineet Shah Apr 08 '14 at 22:33
  • Do they all start with `[space]` and end with `[space]`? – aliteralmind Apr 08 '14 at 22:54
  • Yes they all have data cells with Current Assignments Report – Vineet Shah Apr 09 '14 at 02:47
  • I'm going to need a lot more information if I'm going to be able to help you. I think you're talking about this line: `View`. Is *this* line always surrounded by `[space]` and `[space]`? – aliteralmind Apr 09 '14 at 02:51
  • I modified my question and tried to make it a little more clearer, I would greatly appreciate ideas on how to solve my problem. – Vineet Shah Apr 12 '14 at 21:12
  • Yes and to answer your previous question, that line is always surrounded by a space – Vineet Shah Apr 14 '14 at 02:22
  • When I try to search by that regex it underlines \s and says its an invalid escape sequence, any idea why it would say that? – Vineet Shah Apr 16 '14 at 14:52
  • Where is this happening? In debuggex? With what flavor (JavaScript, Python, PCRE)? – aliteralmind Apr 16 '14 at 14:56
  • No in java, I'm doing it in eclipse here is the code: Pattern pattern = Pattern.compile("\s*?Current Assignments Report.*?\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?View.*?\s*([\w ]+)\s*", Pattern.DOTALL); Matcher matcher = pattern.matcher(html); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } – Vineet Shah Apr 16 '14 at 18:34
  • `Pattern.compile("\s*?Current Assignments Report.*?\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?View.*?\s*([\w ]+)\s*", Pattern.DOTALL);` is illegal because (a) double-quotes must be escaped (`\"`), and (b) *escapes* must be escaped (`\\ `). Change it to `Pattern.compile("\\s*?Current Assignments Report.*?\\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?View.*?\\s*([\\w ]+)\\s*", Pattern.DOTALL);` – aliteralmind Apr 17 '14 at 14:39
  • See http://stackoverflow.com/a/22031684/2736496 and http://stackoverflow.com/a/1379236/2736496 – aliteralmind Apr 17 '14 at 14:40