0

I have a string and in this string I have some special chars not all of them. I tried to convert this string to utf8, ISO_8859_1 but it didn't work. These special chars are Turkish.

Example of data:

<span class="text-nowrap">DOMATES &#xC7;ORBA </span> <span class="text-theme-color-2">|</span>
            <span class="text-nowrap">BALKAN USUL&#xDC; K&#xD6;FTE </span> <span class="text-theme-color-2">|</span>
            <span class="text-nowrap">PEYN&#x130;RL&#x130; MAKARNA </span> <span class="text-theme-color-2">|</span>
            <span class="text-nowrap">BAKLAVA </span> <span class="text-theme-color-2">|</span>

Some of chars equals to:

&#xC7; = Ç

&#xDC; = Ü

I tried these but didn't work:

 data = new String(text.getBytes(StandardCharsets.ISO_8859_1));

 data = StandardCharsets.UTF_8.decode(str_to_bb(text, StandardCharsets.UTF_8)).toString();

[SOLVED]

Thanks. I solved it like this:

data = StringEscapeUtils.unescapeHtml4(text);
gurkan
  • 509
  • 1
  • 4
  • 18
  • 2
    [How to unescape HTML character entities in Java?](https://stackoverflow.com/questions/994331/how-to-unescape-html-character-entities-in-java) – Abra Oct 07 '22 at 13:07
  • such a weird situation... ¬¬ ... – aran Oct 07 '22 at 13:20

0 Answers0