0

Possible Duplicate:
How to parse and process HTML with PHP?

I have the following HTML output I want to match the data from it, tried with preg_match() and preg_match_all() with no success.

<td width="130" valign="top">
Jane Doe<br />
            101 Marisa Cir <br />
            Staten Island NY, 10309<br /><br>

I want to match the "address data" as:

Jane Doe, 101 Marisa Cir Staten Island NY, 10309

I fetch the page with CURL. I tried with something like this with no success:

preg_match('~<td width="130" valign="top">(.*?[^<])<br /><br>~i', $str, $showme);
Community
  • 1
  • 1
bsteo
  • 1,738
  • 6
  • 34
  • 60
  • Looks related, but can you really say it's a duplicate? @Gordon – bozdoz Oct 15 '12 at 15:00
  • @bozdoz in this particular case yes. It is good enough because how to achieve what the OP is asking for has been asked and answered a hundred times before so closing against the canonical is fine. See http://meta.stackexchange.com/questions/104877/are-specific-questions-duplicates-of-general-ones – Gordon Oct 15 '12 at 15:02

2 Answers2

0

[^<] will be saying match anything which is not <, so it won't accept the two <br/> at the end of each line. What happens if you try just:

preg_match('~<td width="130" valign="top">(.+?)<br /><br>~i', $str, $showme);

If you want to remove those <br/> tags afterwards you can replace them out.

Ross McLellan
  • 1,872
  • 1
  • 15
  • 19
0

You need to have the s modifier, as described here. It makes it so that the dot matches new lines. Because your text is over multiple lines, you need the s modifier. You can use a regex like this:

preg_match_all('~"top">(.*?)<br />(.*?)<br />(.*?)<br /><br>$~s', $text, $matches);

And that should work. See the codepad example here.

bozdoz
  • 12,550
  • 7
  • 67
  • 96
  • Tried what you said, still getting no data, I get empty Arrays: Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) [3] => Array ( ) ) – bsteo Oct 15 '12 at 15:21
  • You see that the codepad example works though right? @xtmtrx Maybe there's something more to your situation? – bozdoz Oct 15 '12 at 15:34
  • Yes, the example works perfect, but when I fetch data from my webpage I get no result matched. Weird. – bsteo Oct 15 '12 at 15:49
  • Regular Expressions can be tricky like that. Let me know if there are any updates to your situation @xtmtrx. – bozdoz Oct 15 '12 at 15:51
  • Working on it, I'll get back, your answer is right, should work because in the example works but not in the live test. – bsteo Oct 15 '12 at 15:57
  • Just took a deeper look, seems my HTML page being unde IIS and Coldfusion prints out DOS formatted data with "^M" at the end of the line, could be that? All I tested lately was under Unix environment. How can I match a DOS carriage-return? – bsteo Oct 15 '12 at 16:00
  • This is my skeleton of the script, in the header is the page I try to get data parsed from. http://codepad.viper-7.com/FyYxsV – bsteo Oct 15 '12 at 16:15
  • 1
    Works now, had to remove the "$" – bsteo Oct 15 '12 at 16:50