0

I am writing a curl script for collecting information about some sex offenders, i have developed the script that is picking up links like given below:

http://criminaljustice.state.ny.us/cgi/internet/nsor/... (snipped URL)

Now when we go on this link I want to get information under all the fields on this page like Offender Id:, last name etc. into my own variables. I am very weak in regex that is why I am here. Or is there another way?

Can anybody help me in doing that?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

3 Answers3

4

phpQuery is very nice for screen-scraping in PHP. It lets you access the DOM using the same methods jQuery has.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
1

You don't want regexes (see Can you provide some examples of why it is hard to parse XML and HTML with a regex?, look for an HTML Parser for PHP. See this answer to Can you provide an example of parsing HTML with your favorite parser?

Community
  • 1
  • 1
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
0

I tend to agree with the previous poster about RegEx not being the right tool for the job. If you just want a quick and dirty expression, here goes:

Offender Id:.*
.* [0-9]*

NOTE: You must include the newline in this expression. Also note that this is very fragile as it will break if the source that your are parsing changes much at all.