Find the end of a tag pair

Question

I have

<table id="needle"><tr><td>X</td></tr><table>...</table><table>...</table></table>

I found X position, then I found < position of #needle table, and how to find position of last > pair #needle tag

*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, Nov 15 '11 at 09:22
*(reference)* http://php.net/manual/en/domdocument.getelementbyid.php — Gordon, Nov 15 '11 at 09:24
possible duplicate of [read XML tag id from php](http://stackoverflow.com/questions/3035310/read-xml-tag-id-from-php) — Gordon, Nov 15 '11 at 09:25
@Gordon It's not really a duplicate. He's looking for the end tag, not the content. — Madara's Ghost, Nov 15 '11 at 09:29
@Truth to do what? What is the purpose of looking for the end tag if its not for doing something with the content afterwards? Even if the OP really just wants to know the position, it is much easier to get that information from a parser. — Gordon, Nov 15 '11 at 09:47
To know where the tag ends? For instance? To add an element after it ends but before the rest of the document continues? There're a million things. Even if he does intend to use the content, the question is not an ***exact*** duplicate, as the answers to it will be entirely different. — Madara's Ghost, Nov 15 '11 at 09:50
@Truth All that implies that the OP wants to work with the DOM tree, so using a parser is the better choice. Also, if we only close on *exact duplicates* we wont close anything at all because of the many subtle differences. One guy wants to parse divs, the other links, etc. The linked duplicate is *exact enough*. — Gordon, Nov 15 '11 at 09:53
@VyacheslavLoginov please clarify the question. What is it you want to do? Your current approach sounds like you are doing it in a more difficult way than it has to be. — Gordon, Nov 15 '11 at 10:01
I have already parse my page with simple_html_dom, now I want to cut excess tags — Vyacheslav Loginov, Nov 15 '11 at 10:10
What do you mean by excess tags? Please provide an example with input and output so we can see what you are trying to do. It sounds like you are doing it wrong right now. Do you want to get the innerHTML (in SimpleHTMLDOM: innerText attribute of #needle element)? — Gordon, Nov 15 '11 at 10:11
It's post parsing manipulation, I want to remove some places from parsed content — Vyacheslav Loginov, Nov 15 '11 at 10:14
That is as clear as mud. Please given input and output example. And also show us some of your code. — Gordon, Nov 15 '11 at 10:17

score 2 · Accepted Answer · edited May 23 '17 at 12:21

2

you could use html5lib. Less recommended but works too: YQL by yahoo

EDIT: removed regex because of comments. The probably best overview had gordon listed in his comment: How do you parse and process HTML/XML in PHP?

edited May 23 '17 at 12:21

Community

1
1

answered Nov 15 '11 at 09:26

endo.anaconda

2,449
4
29
55

1

Don't parse XML with regular expressions. – Madara's Ghost Nov 15 '11 at 09:28
1

it is possible or not ? I think the link of gordon the most usefull: http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php/3577662#3577662 – endo.anaconda Nov 15 '11 at 09:31
It's possible, but [you shouldn't](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Madara's Ghost Nov 15 '11 at 09:33

Find the end of a tag pair

1 Answers1