0

I am new to php. As a part of my course homework assignment , I am required to extract data from a website and using that data render a table.

P.S. : Using regex is not a good option but we are not allowed to use any library like DOM, jQuery etc.

Char set is UTF-8.

$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);

$patternform = '/<form(.*)<\/form>/sm';
preg_match_all($patternform ,$html,$matches);

Here regex works fine but when I apply the same regex for table tag, it return me empty array. Is there something to do with whitespaces in $html ?

What is wrong here?

Frank Shearar
  • 17,012
  • 8
  • 67
  • 94
Margi
  • 455
  • 1
  • 9
  • 20
  • Why are you not allowed... homework? – Jason McCreary Mar 08 '13 at 18:59
  • 2
    You should read this [How to parse and process HTML/XML with PHP](http://stackoverflow.com/q/3577641/1592648) Any class telling you to use regex over DOM is a class you should un-enroll from and get a refund. – kittycat Mar 08 '13 at 18:59
  • What information do you need? Target only the specific fields you need, build an array of objects, and then display them in a table. Where are you stuck exactly? – Tchoupi Mar 08 '13 at 19:01
  • We are not allowed to use any external library because they want us to learn the hard way, loose our sleep, get cranky and then post questions on forums for HELP !! – Margi Mar 08 '13 at 19:02
  • @Margi PHP DOM is not an external library it is part of PHP, check out the above link. – kittycat Mar 08 '13 at 19:03
  • @MathieuImbert : My regex for extracting table returns me am empty array. //sm
    – Margi Mar 08 '13 at 19:04
  • The Prof has mentioned in the homework specs to not to use DOM APIs. – Margi Mar 08 '13 at 19:06
  • You need the whole table? Then `'/()/sm'` works for me. – Tchoupi Mar 08 '13 at 19:06
  • That might not work due to new lines, try /([^]*?)<\/table>/sm – QuentinUK Mar 08 '13 at 19:12

1 Answers1

1

The following code produces a good result:

$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);

$patternform = '/(<table.*<\/table>)/sm';
preg_match_all($patternform ,$html,$matches);

echo $matches[0][0];

Result:

enter image description here

Tchoupi
  • 14,560
  • 5
  • 37
  • 71