0

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I am working on a regular expression to help extract a pattern of data from a very large html file generated by various characters php and JavaScript programs. All I need is to match a pattern like these two lines below

<div id="slotqty" class="slotqty" title="<br>Start Date: 04/08/2011<br>End Date  : 04/08/2011<br>">113.67</div></div></div>
<div id="slotcity" class="slotcity">RICE</div><div id="slotqty" class="slotqty" title="<br>"Start Date: 04/06/2011<br>End Date  : 04/06/2011<br>">57</div></div></div>

from a very loaded html file. Coding thus far is

<?php

$url = "http//wwww.amamamamama.com/example";

$file = file_get_contents($url);

preg_match_all ('/[^<div id="slotqty" class="slotqty" title="<br>] + </div>{3,3}$/', $url, $output);

echo "<pre>";

print_r ($output);

echo "</pre>";

?>

Any ideas how to do this problem better that this. Thanks in advance for your help,

John

Community
  • 1
  • 1
john
  • 1
  • 2
  • will those strings always be static? – JohnP Apr 10 '11 at 08:12
  • 3
    *insert obligatory "DONT PARSE HTML WITH REGEX's" warning here.* – Alex Apr 10 '11 at 08:14
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) This question as been answered before. See the post. –  Apr 10 '11 at 08:20
  • this question has been asked a dozen times already. use the search – cweiske Apr 10 '11 at 08:27
  • It's unclear what it is you want to extract. The regex does not have any capture groups so (besides being malformed) it won't capture anything. – Theo Apr 10 '11 at 08:51

1 Answers1

2

While I'm not going to board the "regular expressions are bad" train (somebody else can tell you, or just poke around SO), I'll just simply offer an alternative, unless you specifically MUST use regular expressions.

PHP Simple HTML Dom Parser is a very simple to use scraper that allows a wide variety of scraping methods. Using this would help alleviate some of the possibleconfusion and troublesome that can occur when using regular expressions, and if the content you are scraping changes, you can quickly make the appropriate changes without having to rewrite an entire regular expression. (easier maintainability)

user
  • 16,429
  • 28
  • 80
  • 97