3

I want to trim trailing whitespace at the end of all XHTML paragraphs. I am using Ruby with the REXML library.

Say I have the following in a valid XHTML file:

<p>hello <span>world</span> a </p>
<p>Hi there </p>
<p>The End </p>

I want to end up with this:

<p>hello <span>world</span> a</p>
<p>Hi there</p>
<p>The End</p>

So I was thinking I could use XPath to get just the text nodes that I want, then trim the text, which would allow me to end up with what I want (previous).

I started with the following XPath:

//root/p/child::text()

Of course, the problem here is that it returns all text nodes that are children of all p-tags. Which is this:

'hello '
' a '
'Hi there '
'The End '

Trying the following XPath gives me the last text node of the last paragraph, not the last text node of each paragraph that is a child of the root node.

//root/p/child::text()[last()]

This only returns: 'The End '

What I would like to get from the XPath is therefore:

' a '
'Hi there '
'The End '

Can I do this with XPath? Or should I maybe be looking at using regular expressions (That's probably more of a headache than XPath)?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Diego Barros
  • 2,071
  • 2
  • 33
  • 45

2 Answers2

7

Your example worked for me

//p/child::text()[last()]
nickf
  • 537,072
  • 198
  • 649
  • 721
  • that only gets the last result though, he wants all of them throughout the document – Jake Nov 03 '08 at 04:08
  • no, it gives the exact dataset he was asking for. It returns the last child text element of every p (in this case, three of them) – nickf Nov 03 '08 at 04:12
  • @nickf: You are correct. When you said it worked, I went and double checked. In doing so, it shows that the problem seems to be with the Ruby REXML library's implementation of XPath. Well, I won't say that until I investigate further. Could be a setting I need to pass to REXML (or some such thing) – Diego Barros Nov 03 '08 at 04:33
  • Sorry, I should have mentioned that I was using Ruby & REXML. I incorrectly assumed that XPath would be just XPath. – Diego Barros Nov 03 '08 at 04:35
  • It looks like it is a bug in REXML. – Diego Barros Nov 04 '08 at 07:57
1

Just in case you didn't know, XSL has a normalize-space() function which will get rid of leading and trailing spaces.

AmbroseChapel
  • 11,957
  • 7
  • 46
  • 68
  • Thanks for the response. Can normalize-space() or a similar function, remove trailing spaces only (leaving any leading spaces alone)? – Diego Barros Nov 03 '08 at 08:07