-1

I am trying to extract the address and telephone number from HTML code.

First I get the contents of the container member-address from the content of the page. preg_match('/id="member-addresses".*?<\/div>/is', $webpage, $contact_details);

This returns this

    id="member-addresses">
                <h2>Contact details</h2>
                    <h3 id="foobar">Work</h3>
                        <p>
                            123 Fake Street, Main Area, PG42 TGJ<br />

                            Tel: 020 9 555 42589<br />
</p>
</div>

Now I want to get the Work address.

preg_match('/Work</h3><p>.*?<br \/>/', $contact_details[0], $address_work);

This is not returning anything. What is wrong with it.

Walrus
  • 19,801
  • 35
  • 121
  • 199
  • 1
    Using regex for HTML parsing is a [bad idea](http://stackoverflow.com/a/1732454/2370483) – Machavity Mar 07 '15 at 16:10
  • What's the best way to do it? – Walrus Mar 07 '15 at 16:11
  • Aside from regex there are few ways to do it. http://php.net/manual/en/domdocument.loadhtml.php http://php.net/manual/en/book.simplexml.php (if it is valid xhtml) there are also add on libraries you can add. – chris85 Mar 07 '15 at 16:24

1 Answers1

1

You have a few errors there.

<?php
$a = '  id="member-addresses">
                <h2>Contact details</h2>
                    <h3 id="foobar">Work</h3>
                        <p>
                            123 Fake Street, Main Area, PG42 TGJ<br />

                            Tel: 020 9 555 42589<br />
</p>
</div>';
preg_match('~Work</h3>\s+<p>(.*?)<br />~is', $a, $address_work);
print_r($address_work);

First error you are using the '/' as the regex delimiter so you need to escape all instances of that in the regex. I've swapped that to be tildes because I've found no use for those. Second issue between the closing h3 and p you have white space (if the white space is optional change \s+ to \s* the + requires it). Third issue you aren't grouping what your searching for. Potential fourth issue is with PCRE modifiers the i after the tilde means case insensitive you might not want that. The s means the . will match newlines as well as any character. http://php.net/manual/en/reference.pcre.pattern.modifiers.php

chris85
  • 23,846
  • 7
  • 34
  • 51