I obtain some text from Internet. There are sometimes characters like "&a m p;", "&q u o t;", etc in teh text.
I guess they are some kind of unicode characters in Html. they are HTML encoded string, thanks for jason to point out.
How should I filter all these kinds of things out of the text? I don't want any HTML related code characters. by the way, I am not talking about the HTML tags in the text, only these kinds of unicode things.
thanks