79

Using Java, I want to go through the lines of a text and replace all ampersand symbols (&) with the XML entity reference &.

I scan the lines of the text and then each word in the text with the Scanner class. Then I use the CharacterIterator to iterate over each characters of the word. However, how can I replace the character? First, Strings are immutable objects. Second, I want to replace a character (&) with several characters(amp&;). How should I approach this?

CharacterIterator it = new StringCharacterIterator(token);
for(char ch = it.first(); ch != CharacterIterator.DONE; ch = it.next()) {
       if(ch == '&') {

       }
}
Rob Hruska
  • 118,520
  • 32
  • 167
  • 192
user42155
  • 48,965
  • 27
  • 59
  • 60

11 Answers11

138

Try using String.replace() or String.replaceAll() instead.

String my_new_str = my_str.replace("&", "&");

(Both replace all occurrences; replaceAll allows use of regex.)

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 72
    Be careful with replaceAll, because it uses its first argument as regular expression. I.e. "h.e.l.l.o".replaceAll(".", ",") will give you ",,,,,,,,,"! In Java 1.5 there is new String.replace(CharSequence, CharSequence) method, which does something similar, but doesn't interpret first argument as regular expression. – Peter Štibraný Aug 05 '09 at 17:20
  • 1
    @PeterŠtibraný Or... you could just escape the character you want to replace : `replaceAll("[.]", ",")` – Yassin Hajaj Apr 20 '18 at 14:31
  • 2
    this is not how you would escape a character. I think peter's point is that using regex when you dont need to has potential for unintended side effects. – user4504267 Apr 20 '18 at 16:37
  • Just a side note: We can also use %26 instead of &amp. Looks like in some rest call %26 works rather than &amp. – Raj Stha Sep 01 '23 at 17:50
92

The simple answer is:

token = token.replace("&", "&");

Despite the name as compared to replaceAll, replace does do a replaceAll, it just doesn't use a regular expression, which seems to be in order here (both from a performance and a good practice perspective - don't use regular expressions by accident as they have special character requirements which you won't be paying attention to).

Sean Bright's answer is probably as good as is worth thinking about from a performance perspective absent some further target requirement on performance and performance testing, if you already know this code is a hot spot for performance, if that is where your question is coming from. It certainly doesn't deserve the downvotes. Just use StringBuilder instead of StringBuffer unless you need the synchronization.

That being said, there is a somewhat deeper potential problem here. Escaping characters is a known problem which lots of libraries out there address. You may want to consider wrapping the data in a CDATA section in the XML, or you may prefer to use an XML library (including the one that comes with the JDK now) to actually generate the XML properly (so that it will handle the encoding).

Apache also has an escaping library as part of Commons Lang.

Yishai
  • 90,445
  • 31
  • 189
  • 263
16
StringBuilder s = new StringBuilder(token.length());

CharacterIterator it = new StringCharacterIterator(token);
for (char ch = it.first(); ch != CharacterIterator.DONE; ch = it.next()) {
    switch (ch) {
        case '&':
            s.append("&");
            break;
        case '<':
            s.append("&lt;");
            break;
        case '>':
            s.append("&gt;");
            break;
        default:
            s.append(ch);
            break;
    }
}

token = s.toString();
Sean Bright
  • 118,630
  • 17
  • 138
  • 146
  • 2
    Using a String instead would result in the creation of a temporary String object per iteration. I'm not sure what alternative you would suggest. – Sean Bright Aug 05 '09 at 17:10
  • Are we really assuming that the OP knows about `CharacterInterator` and not `String.replaceAll()`? – Sean Bright Aug 05 '09 at 17:17
  • 4
    +1: Not sure why this received 2 downvotes - It's likely to be far more efficient than replaceAll() - After all why use regular expressions when simply matching on a single character? – Adamski Aug 05 '09 at 17:21
  • Your example solution would need a StringBuffer but the solution to the general problem does not require one. – Taylor Leese Aug 05 '09 at 17:21
  • @Taylor L - I guess we just disagree that the question, as asked, is a "general problem." – Sean Bright Aug 05 '09 at 17:26
  • 6
    Further to my previous comment, I just measured the performance of replaceAll and Sean's solution against a 5000 character String where approximately 10% of characters are '&' - The average replaceAll time is 0.92ms while Sean's solution is 0.29ms. Using a StringBuilder improves the time further to 0.23ms. – Adamski Aug 05 '09 at 17:30
  • 1
    @Adamski - I was just going to do that performance test myself. Thanks for doing the leg work for me! – Sean Bright Aug 05 '09 at 17:40
  • Why complicate the code significantly by prematurely optimizing? Especially when the performance increase is so tiny. Make it work right first, make it readable and maintainable and only after you've done that, if you find you have a performance problem and have profiled your code to pinpoint the exact problem, should you worry about doing microoptimizations like this. – IRBMe Aug 06 '09 at 07:21
  • 3
    It wasn't premature optimization - it was my answer to the question. It just also happens to faster than `String.replaceAll()`, but that wasn't the reason for suggesting it. – Sean Bright Aug 06 '09 at 12:27
  • Thanks Sean, it really helpfull to me where i want replace **"," to "."** and **"." to ","** in **single string**. – Umesh May 18 '12 at 06:30
10

You may also want to check to make sure your not replacing an occurrence that has already been replaced. You can use a regular expression with negative lookahead to do this.

For example:

String str = "sdasdasa&amp;adas&dasdasa";  
str = str.replaceAll("&(?!amp;)", "&amp;");

This would result in the string "sdasdasa&amp;adas&amp;dasdasa".

The regex pattern "&(?!amp;)" basically says: Match any occurrence of '&' that is not followed by 'amp;'.

Sagar P. Ghagare
  • 542
  • 2
  • 12
  • 25
Robert Durgin
  • 1,810
  • 19
  • 23
6

Just create a string that contains all of the data in question and then use String.replaceAll() like below.

String result = yourString.replaceAll("&", "&amp;");
Taylor Leese
  • 51,004
  • 28
  • 112
  • 141
  • If the data is too large, creating a single string consisting of all of the data may be disadvantageous. We can do line-by-line as well. – Bhushan Nov 18 '11 at 19:38
  • Using replaceAll in this case is WRONG! If possible, always use replace instead of replaceAll. It is more efficient and less error prone. – John Henckel Jun 10 '14 at 19:37
3

You can use stream and flatMap to map & to &amp;

    String str = "begin&end";
    String newString = str.chars()
        .flatMap(ch -> (ch == '&') ? "&amp;".chars() : IntStream.of(ch))
        .collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
        .toString();
dehasi
  • 2,644
  • 1
  • 19
  • 31
1

Escaping strings can be tricky - especially if you want to take unicode into account. I suppose XML is one of the simpler formats/languages to escape but still. I would recommend taking a look at the StringEscapeUtils class in Apache Commons Lang, and its handy escapeXml method.

Chris Vest
  • 8,642
  • 3
  • 35
  • 43
1

Try this code.You can replace any character with another given character. Here I tried to replace the letter 'a' with "-" character for the give string "abcdeaa"

OutPut -->_bcdef__

    public class Replace {

    public static void replaceChar(String str,String target){
        String result = str.replaceAll(target, "_");
        System.out.println(result);
    }

    public static void main(String[] args) {
        replaceChar("abcdefaa","a");
    }

}
chamzz.dot
  • 607
  • 2
  • 12
  • 24
0

If you're using Spring you can simply call HtmlUtils.htmlEscape(String input) which will handle the '&' to '&' translation.

Adamski
  • 54,009
  • 15
  • 113
  • 152
0
//I think this will work, you don't have to replace on the even, it's just an example. 

 public void emphasize(String phrase, char ch)
    {
        char phraseArray[] = phrase.toCharArray(); 
        for(int i=0; i< phrase.length(); i++)
        {
            if(i%2==0)// even number
            {
                String value = Character.toString(phraseArray[i]); 
                value = value.replace(value,"*"); 
                phraseArray[i] = value.charAt(0);
            }
        }
    }
-2
String taskLatLng = task.getTask_latlng().replaceAll( "\\(","").replaceAll("\\)","").replaceAll("lat/lng:", "").trim();
Nikos Hidalgo
  • 3,666
  • 9
  • 25
  • 39