4

I have to replace portions of a text, but just if its substrings are not contained between '<' and '>'.

For example, if I have the following text

<text color='blue'>My jeans are red</text>
<text color='red'>I am wearing a red t-shirt</text>
<text color='yellow'>I like red fruits</text>

and I want to replace the word "red" with another word, how can I replace the word in that text without replacing the ones contained between '<' and '>'? I tried to write a regular expression for that but I did not succeed...

A dumb way which I thought is to analyze all the text (char by char), see if I am inside or outside of <...> and replace the occurence of the text just if I am outside... I think there should be a smarter way!

zanlura
  • 3
  • 2
Massimo
  • 3,436
  • 4
  • 40
  • 68
  • 4
    See [this answer](http://stackoverflow.com/a/1732454/721269) for why you should not try to parse (X)HTML with a regex. – David Schwartz Feb 06 '13 at 12:33
  • which `red` do you want to replace? the `color='red'` or `red t-shirt`? – Rain Diao Feb 06 '13 at 12:34
  • I want to replace the "red" t-shirt. I am not trying to parse HTML. I already have html files and, if a user search a word in these, I want to replace its occurrences. Just I don't want that, if a user searches the word "color" or "red", or simply "c" (referred to the example), the occurrences between '<' and '>' are replaced. – Massimo Feb 06 '13 at 12:34
  • If your string contentEquals red then replace it with other string. – Abhishekkumar Feb 06 '13 at 12:36
  • 2
    you may not want to parse it like a markup language file but given what you want to do you probably should. run it through an xml parser and rebuild it but only act on the values rather than tag name and attributes. – ggenglish Feb 06 '13 at 12:43
  • Yes... I probably should parse it... And I think at the end I will do! – Massimo Feb 06 '13 at 12:48
  • @DavidSchwartz is correct. In actually I just treat it as a puzzle, and obviously, not worth any more effort. Withdraw my answer. – Rain Diao Feb 07 '13 at 16:07
  • I think that it is not necessary to further reason about the problem. I saw that each solution, however, is not so clean as I imagined. Following the advice of ggenglish lastly I parsed the document and worked only on the right strings... thanks to everyone for having thought to solve the problem! – Massimo Feb 07 '13 at 18:10

4 Answers4

1

if this is ok for you?

if you just want to do replacement in single line:

final String s = "<text color='red'>I am wearing a red t-shirt</color>";
        System.out.println(s.replaceAll("(?<=>)(.*?)red", "$1blue"));

will print

<text color='red'>I am wearing a blue t-shirt</color>

multi-line case:

final String s = "<text color='red'>I am wearing a red t-shirt</color>\n<text color='red'>You are wearing a red T-shirt</color>";
        System.out.println(s.replaceAll("(?m)^(.*?)(?<=>)([^>]*?)red", "$1$2blue"));

output:

<text color='red'>I am wearing a blue t-shirt</color>
<text color='red'>You are wearing a blue T-shirt</color>
Kent
  • 189,393
  • 32
  • 233
  • 301
  • If you're expecting that string to match on the entire file, it will fail because the `.*` can match across lines including `<` and `>` characters (consider it applied to the *entire* example). If you're expecting it to operate on individual lines, it fails for `` (as three lines). – David Schwartz Feb 06 '13 at 12:55
  • @DavidSchwartz yep, I added a multiline case. anyway, using regex to do this kind of job is tricky and risky.... but if OP needs a quick and dirty shot, could give it a try. – Kent Feb 06 '13 at 13:06
  • not work for `String str= "redred/text>";` – Rain Diao Feb 06 '13 at 13:17
  • @RainDiao as I said, this is not the right job for regex, it doesn't work even for `<..> red red red ` it is hard to write a re expression which works for all cases. if there is one, it would be a parser too. I said in above comments, it is a dirty and quick one-liner, I just tested with text in question's example, nothing more. If OP wants a solution working for all situations, go to take a parser. – Kent Feb 06 '13 at 13:27
  • yes ~ totally agree with you. I just treat it as a regex quiz. – Rain Diao Feb 06 '13 at 14:56
0

A bit longer using support string array, replace only the string between open and close tag < ... > not other text.

        String input ="<text color='red'>I am wearing a red t-shirt</color>";
        String [] end = null;
        String [] start = input.split("<");
        if (start!=null && start.length>0)
            for (int i=0; i<start.length;i++){
                end = start[i].split(">");
            }
        if (end!=null && end.length>0)
            for (int k=0; k<end.length;k++){
                input.replace(end[k], end[k].replace("red", "blue"));
            }
StarsSky
  • 6,721
  • 6
  • 38
  • 63
0

I will replace any 'red' that is not followed by a '>'. After that check for pair of '<' and '>'.

String xml = "<text color='blue'>My jeans are red</text> <text color='red'>I am wearing a red t-shirt</text>red";
xml = xml.replaceAll("red(?=([^>]*<[^>]*?>)*[^<|>]*$)", "blue");
System.out.println(xml);

Here is the Result:

<text color='blue'>My jeans are blue</text> <text color='red'>I am wearing a blue t-shirt</text>
Peter Nguyen
  • 706
  • 7
  • 9
-1
Text=Text.replace(" red ", " blue ");
Text=Text.replace(" red<"," blue<");
Text=Text.replace(" red.", " blue.");
greenapps
  • 34
  • 1