0
public String replace(){
    String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
    for (int i = 0; i < parts.length; i++) {
        System.out.println(parts[i]);

    }
    return "";
}

what makes this line exactly "String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");"? i tried in my code but it didnt do anything..could someone give a string example so i can see how it splits ?

tlq
  • 887
  • 4
  • 10
  • 21

2 Answers2

2

Here is one example of a string that will be split by the regex you provided.

import java.util.regex.*;


public class ReverseRegex{
    public static void main(String[] args) {
        String str = "hello &fjeaifjiajwta; world";
        String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
        for (int i = 0; i < parts.length; i++) {
            System.out.println(parts[i]);
        }
    }
}

Here are a few more examples.

    String str = "hello &21342352352; world"; // Two pieces
    String str = "hello &xffea424242; world"; // Two pieces
    String str = "hello &xffea424242; world &hefiajeifjae; world"; // Three pieces.
merlin2011
  • 71,677
  • 44
  • 195
  • 329
0

The regex is apparently for a named or numbered HTML entity reference, but it's incomplete. It's missing the hash sign for the numbered entities and it doesn't allow for names with digits in them, like &sup2; and &frac14;. Here's what I would use:

"&(?:[a-zA-Z]+[0-9]*|#[0-9]+|#x[0-9a-fA-F]+);"

However, I don't see why you would want to use that regex with split(), which throws away whatever it matches and returns everything else. If you want to do something with the entities themselves, you'll most likely want to use find(). Here's an example that just collects the entities in a list;

List<String> matchList = new ArrayList<String>();
Pattern p = Pattern.compile("&(?:[a-zA-Z]+[0-9]*|#[0-9]+|#x[0-9a-fA-F]+);");
Matcher m = p.matcher(s);
while (m.find()) {
    matchList.add(m.group());
} 
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • i m just trying to figure out how to replace [ä,ü,ö] ina a string with HTML_escapecodes...but it needs to be really a fast change.. not every char in the string needs to be looked for it.. – tlq Apr 15 '14 at 16:43