It looks like text encoded in the GB18030 encoding has been interpreted as Latin-1 and then the characters have been escaped as HTML entity references.
The unescapeHtml4()
method of the StringEscapeUtils
class from Apache Commons Text can be used to unescape entity references, which is demonstrated by the small program below.
笼镜 海王预告片-01.wav
is printed to standard output. This is very similar to what you asked for. Only the first Chinese character is different. If Á
in the input string is changed to Â
, then the program outputs the exact wanted filename (慢镜 海王预告片-01.wav
).
import java.nio.charset.Charset;
import java.io.PrintStream;
import org.apache.commons.text.StringEscapeUtils;
public class Chinese {
public static void main(String[] args) {
String fname = "Áý¾µ º£ÍõÔ¤¸æÆ¬-01.wav";
decode(fname);
}
static void decode(String s) {
Charset latin1 = Charset.forName("latin1");
Charset gb18030 = Charset.forName("gb18030");
Charset utf8 = Charset.forName("utf8");
String unescaped = StringEscapeUtils.unescapeHtml4(s);
byte[] latin1_bytes = unescaped.getBytes(latin1);
String text = new String(latin1_bytes, gb18030);
PrintStream ps = new PrintStream(System.out, true, utf8);
ps.println(text);
}
}