0

I have a massive table with 5 columns and I need to remove 4th and 5th.

Example:

<td><a href="http://sk.wikipedia.org/wiki/%C3%81belov%C3%A1" title="Ábelová">Ábelová</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Okres_Lu%C4%8Denec" title="Okres Lučenec">Lučenec</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Banskobystrick%C3%BD_kraj" title="Banskobystrický kraj">Banskobystrický kraj</a></td>
<td></td>
<td>Ábelfalva</td>

to this:

<td><a href="http://sk.wikipedia.org/wiki/%C3%81belov%C3%A1" title="Ábelová">Ábelová</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Okres_Lu%C4%8Denec" title="Okres Lučenec">Lučenec</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Banskobystrick%C3%BD_kraj" title="Banskobystrický kraj">Banskobystrický kraj</a></td>

in every row.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

4 Answers4

1

Use PHP's DOM extension or any of the DOM parsers suggested in

and use an XPath like

/html/body/drill/down/to/your/table/tr/td[position() = 4 or position() = 5]

How to remove nodes from a DOMDocument has been answered countless times before. See some of my previous answers on how to do that with DOM or use the search function please.

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
0

preg_replace can be a solution, you can also load your file into an SimpleXML object then use a simple loop with a counter and display all td in the tr that aren't 4th nor 5th. You can also use preg_split, but that would be hard. So SimpleXML is the best way IMO. Good luck

Cedric
  • 5,135
  • 11
  • 42
  • 61
-1

Use preg_replace with pattern that match td without a in it and replace with empty string. Or DOM extension.

Radek Benkel
  • 8,278
  • 3
  • 32
  • 41
  • 1
    I didn't downvote you answer, but I'll give you a link: http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege – acm Feb 01 '11 at 09:46
  • 1
    @andre matos: The content of the HTML seems to be well-defined in this case, so I don't think it's necessarily too hard to do this with a regex. – abesto Feb 01 '11 at 09:48
  • @andre matos: I was referring to concrete code - for that code preg_replace will work fine. But thanks for link :) – Radek Benkel Feb 01 '11 at 09:52
  • your solution would be also right, if i were able to create such a regex :D – picitujeromanov Feb 01 '11 at 10:15
-1

You can indeed use preg_replace, but I'd suggest a regex that matches the 4. and 5. <td>.*</td> substrings. A less elegant (but simpler, if you don't know regex) solution is using strpos multiple times, using the result of one as the offset of the next.

abesto
  • 2,331
  • 16
  • 28