0

I am reading some info from a database in MySQL and I need to write them in an excel file

XSSFWorkbook workbook =new XSSFWorkbook();
...
cell.setCellValue(addressValue); // suppose addressValue is what I have obtained from mysql
....
try (FileOutputStream outputStream = new FileOutputStream(myfile)) {
    workbook.write(outputStream);   
} 

In MySQL, the table charset is UTF8, and the field is considered utf8_general_ci.

But this way, it does not handle special characters. for example I see something like "名古屋市" while it should be "名古屋市".

Or

"Москва" which should be "Москва".

Or

"St. Honoré" which should be "St. Honoré".

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Shila Mosammami
  • 999
  • 6
  • 20
  • 2
    Somewhere in the process between the database and POI the UTF8 in the string got URL-encoded. I believe POI has handled UTF-8 just fine for many years. You need to figure out where the URL-encoding happened, likely the process that set `addressValue`. Are you _"...reading ... from a database"_ via a web service? Can you print out (or examine in a debugger) the value of `addressValue`? – Jim Garrison Feb 17 '21 at 21:54
  • The table in database is getting filled by another program that I have no idea of. I am using a simple prepared statement to fetch my data. String city=resultset.getString("city"); – Shila Mosammami Feb 18 '21 at 08:15
  • I need to mention that the data in database is saved like "& #21517;& #21476; 23627; 24066;" I mean that in the table I see the data like that. – Shila Mosammami Feb 18 '21 at 08:27
  • 2
    Those are XML character escapes, and has nothing to do with Java itself: it looks like your data was stored with those escapes, and you need to unescape them. For example, see [how to unescape XML in java](https://stackoverflow.com/questions/2833956/how-to-unescape-xml-in-java) – Mark Rotteveel Feb 18 '21 at 16:55

1 Answers1

1

1- Thanks to Mark Rotteveel, I share the code that solved my problem here maybe it works for others as well. The link is: how to unescape XML in java

public static String unescapeXML( final String xml )
{
    Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" );
    // Matcher requires a StringBuffer instead of a StringBuilder
    StringBuffer unescapedOutput = new StringBuffer( xml.length() );

    Matcher m = xmlEntityRegex.matcher( xml );
    Map<String,String> builtinEntities = null;
    String entity;
    String hashmark;
    String ent;
    int code;
    while ( m.find() ) {
        ent = m.group(2);
        hashmark = m.group(1);
        if ( (hashmark != null) && (hashmark.length() > 0) ) {
            code = Integer.parseInt( ent );
            entity = Character.toString( (char) code );
        } else {
            //must be a non-numerical entity
            if ( builtinEntities == null ) {
                builtinEntities = buildBuiltinXMLEntityMap();
            }
            entity = builtinEntities.get( ent );
            if ( entity == null ) {
                //not a known entity - ignore it
                entity = "&" + ent + ';';
            }
        }
        m.appendReplacement( unescapedOutput, entity );
    }
    m.appendTail( unescapedOutput );

    return unescapedOutput.toString();
}

private static Map<String,String> buildBuiltinXMLEntityMap()
{
    Map<String,String> entities = new HashMap<String,String>(10);
    entities.put( "lt", "<" );
    entities.put( "gt", ">" );
    entities.put( "amp", "&" );
    entities.put( "apos", "'" );
    entities.put( "quot", "\"" );
    return entities;
}

2- Ok Guys, in case anyone needs it I found out a simpler way to achieve it: you have to import org.apache.commons.text.StringEscapeUtils;

StringEscapeUtils.unescapeHtml4("Your String");  

This is how I printed my text

you see in the pticture the text I printed as an output.

Shila Mosammami
  • 999
  • 6
  • 20