-3

Using Java, String and replaceAll I have to replace values of elements that may come with different name spaces:

from

<tns:p>to be replaced</tns:p> 
<sss:p>to be replaced</sss:p> 

to

<tns:p>replaced</tns:p> 
<sss:p>replaced</sss:p>

Could you please, help to find regular expression for this replace?

P.S. The elements may appear more than once in a given string:

<tns:p>to be replaced</tns:p>
<tns:w>not to be replaced</tns:w>
<tns:p>to be replaced</tns:p>

I have a problem with variable name spaces in front of elements. Without them I'd do like this:

str.replaceAll("(?<=<p>)(.*?)(?=</p)", "replacement")
edgmsh
  • 5
  • 3
  • 2
    And have you tried anything? Some code? – Jorge Campos May 21 '14 at 15:33
  • As with nearly every question in this tag, your description leaves quite a bit of ambiguity. Are you wanting to only retain the final word in the element? E.g. would `tell us more` become `more`? – Duncan Jones May 21 '14 at 15:34
  • It's reasonably easy to do, but what is it about the problem that you are having difficulty with? – Bohemian May 21 '14 at 15:35
  • 1
    Why would you want to use a regex rather than an XML API? That's almost *always* the wrong decision. – Jon Skeet May 21 '14 at 15:36
  • 1
    @Duncan I think he's given unfortunate examples for his strings: he wants `x` to become `y` – Bohemian May 21 '14 at 15:36
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – Jim Garrison May 21 '14 at 15:36
  • 1
    @Bohemian Ah, that sounds plausible. One day I'll write my ranting blog post about how to (not) specify a regex question and just start linking to that each time. – Duncan Jones May 21 '14 at 15:37
  • @JonSkeet because (in java at least) it's easier, uses way less code, and is a *lot* faster to use regex - all by a couple of orders of magnitude. – Bohemian May 21 '14 at 15:38
  • 1
    @Bohemian: And far harder to get *right*. Personally I'd rather have robust code that might be a bit slower than fast code which will fail as soon as it has slightly unusual (but valid) input. – Jon Skeet May 21 '14 at 15:39
  • @JonSkeet sure, code that works beats buggy/brittle code every time and regex doesn't lend itself to "parsing" xml, but in this case it's a problem in which the xml context is irrelevant: regex can tackle this easily - it's essentially a string replacement problem with a simple back-reference in the regex to be safe (you could even omit the back-reference if you knew the input was reliably well formed). – Bohemian May 21 '14 at 15:42
  • I'll look forward to ensuring that the answer works in the face of CDATA etc... – Jon Skeet May 21 '14 at 15:48
  • @JonSkeet OK - you got me with CDATA, but that seems to be out of scope (the CDATA would have to firstly exist and secondly itself contain a closing tag... unlikely) and over-engineering is bad too – Bohemian May 21 '14 at 15:56
  • Each to their own. Whenever I want to manipulate XML, I use an XML API. It's as simple as that for me. Admittedly I'd usually use LINQ to XML and C#, which is much nicer than any Java XML API I've used... – Jon Skeet May 21 '14 at 16:02

2 Answers2

1

The problem is that look-behinds can't have variable lengths, but if your input is well formed (that is tags are closed with matching tags), and you text to be replaced isn't a CDATA element that itself contains the closing tag (seems unlikely), this will work:

str = str.replaceAll("(?<=[:<]p>)[^<]*(?=</(\\w+:)?p>)", "replacement");

This regex makes the replacement whether or not there's a namespace.


Here's some test code:

String str = "<p>to be replaced</p><tns:p>to be replaced</tns:p><tns:w>not to be replaced</tns:w><tns:p>to be replaced</tns:p>";
str = str.replaceAll("(?<=[:<]p>)[^<]*(?=</(\\w+:)?p>)", "replacement");
System.out.println(str);

Output:

replacement

replacementnot to be replacedreplacement

If you input is not well formed and simple, ie the closing tag namespace may not be the same, you can do it by capturing the namespace, using a back-reference to assert it's the same in the closing tag, and putting it back in the replacement:

str = str.replaceAll("(<(\\w+:)?p>)[^<]*(?=</(\\2)p>)", "$1replacement");

The namespace is still optional, but now the namespace in the closing tag must match that of the opening tag.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • No worries. There's another option I just added in case you want to be stricter with the namespace matching. – Bohemian May 21 '14 at 16:03
  • Sorry, it worked for you because you separated elements with \n (new line). – edgmsh May 21 '14 at 16:11
  • For this case it won't work String str = "this it to be replacednot to be replacedthis it to be replaced"; – edgmsh May 21 '14 at 16:11
  • Fixed: The reluctant quantifier wan't reluctant enough - I changed it to be "not a `<` char". Try the new version. – Bohemian May 21 '14 at 16:22
0

Lookbehinds in java regex don't support repitive operators, So unfortunatly this is not possible with just one String#replaceAll(String, String)

DirkyJerky
  • 1,130
  • 5
  • 10