Netbeans does not decode special characters

Question

I'm trying to parse a HTML file using Jsoup. In this HTML there is a special character that I want to remove, (€), this is how it's originally:

<span class="price-value">
    49,99 €
</span>

However, Netbeans shows this when printing that element:

49.99 ?

Therefore, I cannot do this:

price.replace( "€", "" ).replace( ",", "." ).trim();

Neither this:

price.replace( "\\?", "" ).replace( ",", "." ).trim();

What can I do about it?

You should take a look at this https://docs.oracle.com/javase/tutorial/i18n/text/string.html — Natecat, Apr 02 '16 at 18:06

score 0 · Answer 1 · edited May 23 '17 at 11:50

0

Modified from here:

To match individual characters, you can simply include them in an a character class, either as literals or via the \u20AC syntax

The unicode for the Euro is \u20AC.

Note: I'm not sure why it would be displayed as a ?, but that might be just because it's not ASCII, and might be missing in the font.

edited May 23 '17 at 11:50

Community

1
1

answered Apr 02 '16 at 18:38

Laurel

5,965
14
31
57

score 0 · Answer 2 · answered Apr 02 '16 at 19:35

0

Use this ->

<span class="price-value">
49,99 &euro;
</span>

It is the representation of € sign in HTML

answered Apr 02 '16 at 19:35

P Sharma

184
9

score 0 · Accepted Answer · edited May 23 '17 at 12:15

Netbeans shows this when printing that element

Almost certainly this is because your NetBeans console hasn't been configured to support Unicode chars, which is why you've been misled. For a solution to that, see: How to change default encoding in NetBeans 8.0

So, the document is fine, the regular expressions would have worked, and there's no need to change anything else.

Here's a minimal working example of the original document getting parsed correctly, the Euro symbol replaced, and 49.99 returned.

Element doc = Jsoup.parse("<html><body><span class=\"price-value\">49,99 €</span></body></html>");
Element span = doc.select("span").get(0);
System.out.println( span.text().replace("€", "").replace(",", ".").trim() );

It's weird, because It has always been working until now. I've reinstalled Netbeans and now it works fine. — Dani M, Apr 04 '16 at 10:51

Netbeans does not decode special characters

3 Answers3