I have this code in java to take a PDF file and extract all the text:
File file= new File("C:/file.pdf");
PDDocument doc= PDDocument.load(file);
PDFTextStripper s = new PDFTextStripper();
content= s.getText(doc);
System.out.println(content)
If we run the application with Windows, it works correctly and extracts all the text. However, when we pass the app to the server that uses Linux, the spanish accents are converted into "strange" characters like --> "carácter" (it should be "carácter"). I tried to convert the String to bytes and then to UTF8 unicode:
byte[] b = content.getBytes(Charset.forName("UTF-8"));
String text= new String(b);
System.out.println(text);
But it does not work, in Windows it continues working well but in the Linux server it still shows wrong the spanish accents, etc ... I understand that if in a Windows environment it works correctly, in a Linux environment it should have to work too ... Any idea of What can it be or what can I do? Thank you