XPath - How to extract specific part of the text from one text node

Question

I would like to extract only the part of the text from td, for example "FLAC". How can it be done using XPath?

I've tried //text()[contains(., 'FLAC')], but it returns me the whole text.

                    <tr>
                        <td class="left">Format plików</td>
                        <td>
                                                                AVI, FLV, RM, RMVB, FLAC, APE, AAC, MP3, WMA, OGG, BMP, GIF, TXT, JPEG, MOV, MKV, DAT, DivX, XviD, MP4, VOB
                                                        </td>
                    </tr>

You already know that the text node contains "FLAC" -- why then you would extract it? Just use the string "FLAC" -- I really don't understand what you want to do... — Dimitre Novatchev, May 15 '12 at 13:08
I know that the text node contains "FLAC" but the thing is that I want to extract only this specific word from this text node not the whole node. — Mateusz Malinowski, May 15 '12 at 13:12
But *why* extract it from the node when you already have it as literal string? — Dimitre Novatchev, May 15 '12 at 13:14
Ok, I'll try to explain. I got a new task at work and I need to use XPath in it. This particular XPath should work this way: "If in node , which is sibling of node appears word "FLAC" then extract this word". I hope that now it's clear. If not, sorry -I'm a beginner in this subject. — Mateusz Malinowski, May 15 '12 at 13:44
That's not really what your question was asking though, when you initially post a question put as much detail as possible so there isn't unnecessary back-and-forth in the comments. — JWiley, May 15 '12 at 13:48
No, after you determine that the text node contains the literal "FLAC", then it isn't necessary at all to "extract" this literal from the text node -- you already have the string "FLAC" in your code -- why not use it directly? Why do unnecessary work and cause unnecessary slow down ? — Dimitre Novatchev, May 15 '12 at 14:28
It is misunderstanding - I'm not making any code. I have to export some content from the site and put it to google spreadsheet. I'm making a list of products from the site (multimedia players) and I want to make a condition using XPath - "If on the site is information that player can play FLAC files give me back "FLAC"". The code looks just like I paste it. So I want to cut out just one word from whole text without counting words just by the match. "If in specific node is a FLAC word, give me back this word". — Mateusz Malinowski, May 15 '12 at 15:42
You could just check for FLAC, and return FLAC using substring...i'll update my answer — JWiley, May 15 '12 at 17:31

score 11 · Accepted Answer · edited May 23 '17 at 10:30

You'll have to specify where in your tree first, and since you have multiple <td> elements you first want to find the node containing the text.

substring(//tr/td[contains(@class, 'left')]/following-sibling::text()[1], startIndex, length)

or

substring(//tr/td[@class='left']/following-sibling::text()[1], startIndex, length)

Update as per the comments:

T/F contains(//tr/td[@class='left']/following-sibling::text()[1], 'FLAC')

This will give you the T/F for the sibling element after which has the word "FLAC." You could use substring() to grab a subset of that string, but that's only in static cases. I'd suggest using a different method such as XSLT to alter/separate the string. Hope this helps!

Update 2

substring('FLAC',1,4*contains(//tr/td[@class='left']/following-sibling::text()[1], 'FLAC'))

this will return FLAC, if FLAC is present in the node you're inspecting, and blank if not....

Step-by-step breakdown:

//tr/td[@class='left'] - This returns ALL <td> nodes which have an attribute "class" set to "left"
/following-sibling::text() - This returns all nodes' text after the node above.
Adding [1] returns the first node from the list above.
Wrapping this in contains(aboveValue, 'FLAC') will return TRUE(or 1, in this example), if 'FLAC' is present in the text, and False(0) if it is not.
Wrapping all of this in substring('FLAC',1,4*aboveValue) is the equivalent of an If/Then/Else in XPath 1.0, since there isn't a built-in function to do so: If 'FLAC' is present, pull the substring 1,4*(true=1)=4, which is the whole string. If 'FLAC' is not present, pull the substring 1,4*(false=0)=0, which is none of the string.

Another thing to note, contains() is case-sensitive so if this field can have "flac," it will return false. To check for all case mixes of FLAC, use translate(), example here.

XPath from second update works in my case. If I can ask you about one more thing - can you explain this query step by step? Thank you for patience and help to figure it out. — Mateusz Malinowski, May 15 '12 at 18:32

XPath - How to extract specific part of the text from one text node

1 Answers1

Linked