How to remove n-th html element in a string in PHP

Question

I have a massive table with 5 columns and I need to remove 4th and 5th.

Example:

<td><a href="http://sk.wikipedia.org/wiki/%C3%81belov%C3%A1" title="Ábelová">Ábelová</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Okres_Lu%C4%8Denec" title="Okres Lučenec">Lučenec</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Banskobystrick%C3%BD_kraj" title="Banskobystrický kraj">Banskobystrický kraj</a></td>
<td></td>
<td>Ábelfalva</td>

to this:

<td><a href="http://sk.wikipedia.org/wiki/%C3%81belov%C3%A1" title="Ábelová">Ábelová</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Okres_Lu%C4%8Denec" title="Okres Lučenec">Lučenec</a></td>
<td><a href="http://sk.wikipedia.org/wiki/Banskobystrick%C3%BD_kraj" title="Banskobystrický kraj">Banskobystrický kraj</a></td>

in every row.

score 1 · Accepted Answer · edited May 23 '17 at 09:58

1

Use PHP's DOM extension or any of the DOM parsers suggested in

Best methods to parse HTML

and use an XPath like

/html/body/drill/down/to/your/table/tr/td[position() = 4 or position() = 5]

How to remove nodes from a DOMDocument has been answered countless times before. See some of my previous answers on how to do that with DOM or use the search function please.

edited May 23 '17 at 09:58

Community

1
1

answered Feb 01 '11 at 09:53

Gordon

312,688
75
539
559

score 0 · Answer 2 · answered Feb 01 '11 at 09:45

preg_replace can be a solution, you can also load your file into an SimpleXML object then use a simple loop with a counter and display all td in the tr that aren't 4th nor 5th. You can also use preg_split, but that would be hard. So SimpleXML is the best way IMO. Good luck

score -1 · Answer 3 · answered Feb 01 '11 at 09:42

-1

Use preg_replace with pattern that match td without a in it and replace with empty string. Or DOM extension.

answered Feb 01 '11 at 09:42

Radek Benkel

8,278
3
32
41

1

I didn't downvote you answer, but I'll give you a link: http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege – acm Feb 01 '11 at 09:46
1

@andre matos: The content of the HTML seems to be well-defined in this case, so I don't think it's necessarily too hard to do this with a regex. – abesto Feb 01 '11 at 09:48
@andre matos: I was referring to concrete code - for that code preg_replace will work fine. But thanks for link :) – Radek Benkel Feb 01 '11 at 09:52
your solution would be also right, if i were able to create such a regex :D – picitujeromanov Feb 01 '11 at 10:15

score -1 · Answer 4 · answered Feb 01 '11 at 09:47

-1

You can indeed use preg_replace, but I'd suggest a regex that matches the 4. and 5. <td>.*</td> substrings. A less elegant (but simpler, if you don't know regex) solution is using strpos multiple times, using the result of one as the offset of the next.

answered Feb 01 '11 at 09:47

abesto

2,331
16
28

i'm lame in regex, can you please show an example how to do it ? – picitujeromanov Feb 01 '11 at 09:50

How to remove n-th html element in a string in PHP

4 Answers4