change all xml entities to html

Question

I am reading a document which may contain XML entities like &#160.

Since I need to export txt file, I manually have to convert the entities from XML to text.

As you can see below.

reader = new BufferedReader(new InputStreamReader(is, "utf-8"));
while ((s = reader.readLine()) != null) {
 if (s.equals("&#160"))
   s= " ";
}

Since there are many xml entities, and I want to convert them all to text like &#160->space, and prefer to avoid if then, is there a generic way to do it?

Your question is not clear. Please show some sample input and output. — Jim Garrison, Feb 01 '11 at 21:12
You're talking about entities, not tags. Edited question accordingly. — skaffman, Feb 01 '11 at 21:24

score 2 · Accepted Answer · answered Feb 01 '11 at 22:13

2

When you extract the number from  , you can do this:

(new String(new byte[]{(byte)160}, "ISO-8859-1")).

Here are the entity mappings: HTML ISO-8859-1 Reference

answered Feb 01 '11 at 22:13

padis

2,314
4
24
30

score 1 · Answer 2 · edited May 23 '17 at 12:04

1

I believe what you're talking about is called HTML (not XML) decoding. There is a URLDecoder class which does this for URLs (which may be what you're decoding). There is also a more general class in Apache commons for HTML decoding (specified in this question).

Edit: I was unaware of the difference between HTML and XML escapes/entities, thanks for the clarification. It appears from this question that Apache commons has a library for decoding XML entities but the standard Java library does not.

edited May 23 '17 at 12:04

Community

1
1

answered Feb 01 '11 at 21:31

Pace

41,875
13
113
156

1

I am actually looking for XML decoding. is XML entity and not HTML which will be &nbps;. – Dejell Feb 01 '11 at 21:53
1

URL decoding changes `%20` to a space; entity decoding changes ` ` or ` ` to a space, or ` ` to a non-breaking space - "Numeric character entity references" are valid in both XML and HTML, not XML only. – Stephen P Feb 02 '11 at 00:33

change all xml entities to html

2 Answers2