i am aware that this questions has been asked for example here: XPath select all elements between two specific elements
but there and in a few other google hits they use hard coded values to select specific data.
what i need would like todo is get a list of text with each parent:
<doc>
<divider />
<p>text</p>
<p>text</p>
<p>text</p>
<p>text</p>
<p>text</p>
<divider />
<p>text</p>
<p>text</p>
<divider />
<p>text</p>
<divider />
</doc>
to get the first text elements you can do:
/*/p[count(preceding-sibling::divider)=1]
but what i want as ouput is something like this:
[['<doc>'], ['<p>text</p>', '<p>text</p>', '<p>text</p>', '<p>text</p>', '<p>text</p>'], ['<p>text</p>', '<p>text</p>'], ['<p>text</p>']]
now you got a list of every text element for divider 1, divider 2, divider x...
which you get from this python code:
data = open("inputfile", 'r')
matches = []
tmp = []
for line in data.readlines():
currentLine = line.strip()
if 'divider' in currentLine:
if len(tmp) > 0:
matches.append(tmp)
tmp = []
else:
tmp.append(currentLine)
print(matches)
yes, theres a 'doc' at the beginning, its just an example, not perfect. so with this code you can also save the parent in the same list, in the testdate thats always divider so i did not do it.
whats the xpath magic for this?