0

I have a html table, generated by another website that I'm trying to convert to a php array.

I can not convert it using simplexml because the code of the generated table is not valid, and cause a lot of errors, also I need to keep some attributes of the table td elements, and remove the others.

What would be the most efficient way of doing this? Or do you know any php class that could help me achieve this?

BTW: What I'm trying to do is convert an school schedule to a php array, that I will be able to exploit after.

Here is an example of the data I retrieve: http://paste2.org/p/1869193

Btw, using php strip tags, I already remove the unnecessary tags such as spans and fonts.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 1
    Try this http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php, although it might not work because the HTML is not valid. – Maher4Ever Jan 15 '12 at 22:49
  • Thank you!, it cleans my html, so I might be able to work with this. –  Jan 15 '12 at 22:56
  • Great, I'll post an answer as it might be helpful to someone else too. – Maher4Ever Jan 15 '12 at 23:01

2 Answers2

1

You can also use PHP's Tidy if installed (it is by default on some installs) - it not only cleans up the HTML, but also lets you traverse the DOM:

http://www.php.net/manual/en/book.tidy.php

SteveK
  • 996
  • 1
  • 8
  • 11
0

You can find a list of HTML parserd in the answers of the following question on SO: Robust and Mature HTML Parser for PHP

Community
  • 1
  • 1
Maher4Ever
  • 1,270
  • 11
  • 26
  • I'm not really sure it is even possible to build a parser that fixes HTML before parsing it. I think your best bet is t fix the HTML yourself before feeding it to any parser. – Maher4Ever Jan 15 '12 at 23:15