0

Hey I'm working on a web application and have problems with read UTF-8 chars from txt files. I get UTF-8 working that way: UTF-8 web encoding (and it workes fine except at the import). I tryed a lot of thinks (especially from: read UTF-8 string literal java) but nothing work and I have no idea why.

The importent codesnippets:

import.jsp

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Import</title>
<link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
<script src="https://code.jquery.com/jquery-1.12.4.js"></script>
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
<script src="script.js"></script>
<link rel="stylesheet" type="text/css"
media="screen and (min-device-width: 500px)" href="style.css" />
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<body>
<form>
    <!-- show import data -->
</form>
<form id="importForm" action="${pageContext.request.contextPath}/ImportData" method="post" onsubmit="return importValidation();" enctype="multipart/form-data">
    <input type="file" name="file" accept=".txt"/>
    <input type="submit" value="Import">
</form>

</body>
</html>

ImportData Servlet:

import java.nio.charset.StandardCharsets;

@WebServlet("/ImportData")
@MultipartConfig
public class ImportData extends HttpServlet {

    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
         Part filePart = request.getPart("file"); // Retrieves <input type="file" name="file">
         BufferedReader buf = new BufferedReader(new InputStreamReader(filePart.getInputStream(), StandardCharsets.UTF_8.name()));
         String lineJustFetched = null;
         String[] wordsArray = null;
         ArrayList<String> texts = new ArrayList<String>();
         while(true){
             lineJustFetched = buf.readLine();
             if(lineJustFetched == null){  
                 break; 
             }else{
                 wordsArray = lineJustFetched.split("\t");
                 for(String each : wordsArray){
                     texts.add(each);
                 }
             }
         }
         buf.close();

        System.out.println(texts);

        //create Import Data in Backend and write it into db

        response.sendRedirect("import.jsp");
    }
}

System details: Tomcat server 7 with Java 1.7

The outprint of texts for UTF-8 chars is a square and in html inputs (and texts) is a � instead of the UTF-8 chars

So my question is: Where and why do I lost the UTF-8 encoding?

SaScH_MaN
  • 23
  • 5
  • Are you viewing the output in a terminal? And are you certain the original file is a UTF-8 file? – VGR Dec 01 '17 at 14:44
  • Hard to say without more info, but the � character let me think that you have correctly loaded the file, but just fail to print unicode characters... – Serge Ballesta Dec 01 '17 at 14:44
  • At VGR: I build a tunnel with putty to the server and there i can see the outprints. The original file contains UTF-8 chars so I think it is a UTF-8 file (I'm not at work anymore so I can't look up). – SaScH_MaN Dec 01 '17 at 16:43
  • At Serge Ballesta: How could i check this? I don't think that I loaded the file correctly. I checked my Database and there are no UTF-8 chars saved from this import Data too (over html inputs i can save UTF-8 chars into db). – SaScH_MaN Dec 04 '17 at 10:08
  • Ok I didn't look right... The file is not UTF-8 encoded (it is ANSI encoded) with UTF-8 encoding my code works fine. – SaScH_MaN Dec 04 '17 at 10:46

1 Answers1

0

Ok I didn't look right... The file is not UTF-8 encoded (it is ANSI encoded) with UTF-8 encoding this code workes fine.

To make it runnable for an other encoding you have only to change the InputStreamReader encoding to read the file correctly.

e.g.

 BufferedReader buf = new BufferedReader(new 
       InputStreamReader(filePart.getInputStream(), "Cp1252"));

(for windows-ANSI)

SaScH_MaN
  • 23
  • 5