-3

I have a database that holds HTML of a court's docket system so that I can easily search through the docket for certain motions, briefs, etc. Each new entry in the docket is displayed with a line between then to delimit the entry as a new entry.

Each new entry begins with a tag. The next in the docket entry might be what I'm looking for. For example, if I searched for "motion to enforce" then when the regex finds that, it pulls that information out (the link directly to the scanned document at the court's website). I want to display the entire docket entry in my search results so I can see "Motion to Enforce" and determine is this a "Motion to Enforce Visitation" or is this a "Motion to Enforce Settlement".

The problem that I have is that every single docket entry begins with the same TBODY tag so if I used a reged like

/\<TBODY class=\"docketEntry\"\>(.*?)(motion to enforce)/i 

The match returned is the very first TBODY entry on the page and all text in between until it gets to the text "motion to enforce". I don't want this because I want to only go to the beginning of the docket entry with "motion to enforce" in the entry. It feels like I need to find the "motion to enforce" language and work back towards the TBODY part of the code but I'm not sure how to work backwards or if it is even possible.

My other thought was to do a strrev() and then match it and reverse the string back but I figured there might be a better way to do this.

My other thought was to find the TBODY but if it finds another TBODY before it gets to the "motion to enter" language, it will not include the first TBODY in the returned match results.

EXAMPLE:

<TBODY class="docketEntry">
some uninteresting docket entry here
</TBODY>
<TBODY class=docketEntry">
Motion to Enforce Visistation
</TBODY>

with the (.*?), this entire example would be a match but I only want the TBODY immediately preceding the "Motion to Enforce". My thought was if it only match a pattern that had a TBODY followed by any text other than another TBODY, then the "Motion to Enforce" text, that would give me exactly what I want.

The point of this is to be able to have this in a MySQL query and fetch exactly what I need cutting out the steps of parsing or matching anything after I get the result.

Thanks for ANY help!

Jarod

EDIT: With this being 6 years ago, I have since learned WHY regex on HTML is such a bad idea. It's slow and prone to errors. The best way I've found is [Simple HTML DOM 1.5]1

j_allen_morris
  • 559
  • 2
  • 11
  • 26

1 Answers1

3

You could use preg_match_all, match every pattern in the string and choose the matching you want

Philipp
  • 15,377
  • 4
  • 35
  • 52