2

I'm importing a file into my code and trying to print it. the file contains

 i don't like cake.
 pizza is good.
 i don’t like "cookies" to.
 17.
 29.

the second dont has a "right single quotation" and when I print it the output is

 don�t 

the question mark is printed out a blank square. is there a way to convert it to a regular apostrophe? EDIT:

     public class Somethingsomething {


     public static void main(String[] args) throws FileNotFoundException, 
    IOException {
    ArrayList<String> list = new ArrayList<String>();
    File file = new File("D:\\project1Test.txt");//D:\\project1Test.txt
    if(file.exists()){//checks if file exist

    FileInputStream fileStream = new FileInputStream(file);
    InputStreamReader input = new InputStreamReader(fileStream);
    BufferedReader reader = new BufferedReader(input);      

    String line;
    while( (line = reader.readLine()) != null) { 
     list.add(line);
     }


    for(int i = 0; i < list.size(); i ++){
        System.out.println(list.get(i));
    }

    }

    }}

it should print as normal but the second "don't" has a white block on the apostrophe this is the file I'm using https://www.mediafire.com/file/8rk7nwilpj7rn7s/project1Test.txt edit: if it helps even more my the full document where the character is found here https://www.nytimes.com/2018/03/25/business/economy/labor-professionals.html

Cœur
  • 37,241
  • 25
  • 195
  • 267
abc123
  • 47
  • 1
  • 6
  • 1
    Mmmmmmm, and your code? – Héctor M. Mar 29 '18 at 01:15
  • Printing to what? A console window, a Text Type component, or an actual printer? What exactly do you mean, "I'm importing a file into my code"? How are you importing it? – DevilsHnd - 退職した Mar 29 '18 at 01:19
  • Please read [ask] and [mcve], and then come back and [edit] your post to more clearly explain the problem, include the *relevant* portions of your problem code, and ask a specific question related to that code, and we can try to help. – Ken White Mar 29 '18 at 01:31
  • added my code sorry – abc123 Mar 29 '18 at 01:32
  • Currently you open and read the text file with a "default" charset, which might change depending the os environment. Make sure to specify the correct charset when reading the text file – André Schild Mar 29 '18 at 09:04
  • that's the problem I dont know what charset to use my professor just gave us a random text file. – abc123 Mar 29 '18 at 14:42

3 Answers3

2

It’s all about character encoding. The way characters are represented isn't always the same and they tend to get misinterpreted.

Characters are usually stored as numbers that depend on the encoding standard (and there are so many of them). For example in ASCII, "a" is 97, and in UTF-8 it's 61.

Now when you see funny characters such as the question mark (called replacement character) in this case, it's usually that an encoding standard is being misinterpreted as another standard, and the replacement character is used to replace the unknown or misinterpreted character.

To fix your problem you need to tell your reader to read your file using a specific character encoding, say SOME-CHARSET.

Replace this:

InputStreamReader input = new InputStreamReader(fileStream);

with this:

InputStreamReader input = new InputStreamReader(fileStream, "SOME-CHARSET");

A list of charsets is available here. Unfortunately, you might want to go through them one by one. A short list of most common ones could be found here.

elmiomar
  • 1,747
  • 2
  • 17
  • 27
  • I trade changed the encoding to UTF-8 and I get the same results if it helps im using this file https://www.mediafire.com/file/8rk7nwilpj7rn7s/project1Test.txt – abc123 Mar 29 '18 at 02:19
  • There are a lot of [charsets](http://www.iana.org/assignments/character-sets/character-sets.xhtml). Try at least [these](https://docs.oracle.com/javase/9/docs/api/java/nio/charset/Charset.html). Btw, what platform are you on? – elmiomar Mar 29 '18 at 02:30
  • @abc123 Check my post – Héctor M. Mar 29 '18 at 03:41
  • im on windows ten and i checked the ones they gave in the link either got the same output or 椠摯湴❴楫攠捡步⸍ੰ楺穡⁩猠杯潤⸍੩⁤潮璒琠汩步•捯潫楥猢⁴漮ഊㄷ⸍ਲ㤮 – abc123 Mar 29 '18 at 04:42
  • https://www.nytimes.com/2018/03/25/business/economy/labor-professionals.html if it helps this is where my professor found the document to use, he copy and pasted it into a text file i think – abc123 Mar 29 '18 at 04:58
  • @abc123 can you copy paste the text from your question to a new file, save, and then test with the new file? – elmiomar Mar 29 '18 at 05:12
  • I separated symbol that's giving me a problem, the original file is much larger – abc123 Mar 29 '18 at 14:39
  • @abc123 looking at the link you provided, I couldn't find it. Can you copy this `i don’t like "cookies" to.` paste it in a file, save and use this file with your code. (copy from this comment). – elmiomar Mar 29 '18 at 15:26
  • i found the charset actually! its windows-1250 had to go to each charset 1 by 1 thanks everyone though ! the " i dont like cookies" i wrote but i just used that apostrophe from that text file – abc123 Mar 29 '18 at 15:37
0

Your problem is almost certainly the encoding scheme you are using. You can read a file in most any encoding scheme you want. Just tell Java how your input was encoded. UTF-8 is common on Linux. Windows native is CP-1250.

This is the sort of problem you have all the time if you are processing files created on a different OS.

See here and Here

Terry
  • 911
  • 10
  • 26
  • CP-1250 and 1252 are for legacy applications the most commonly used today is Unicode UTF-16. [here](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx) and [here](https://msdn.microsoft.com/en-us/library/windows/desktop/dd374081(v=vs.85).aspx) – elmiomar Mar 29 '18 at 02:16
0

I'll give you a different approach...

Use the appropriate means for reading plain text files. Try this:

public static String getTxtContent(String path)
{
    try(BufferedReader br = new BufferedReader(new FileReader(path))) 
    {
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();

        while (line != null) {
        sb.append(line);
        sb.append(System.lineSeparator());
        line = br.readLine();
    }
    return sb.toString(); 
    }catch(IOException fex){ return null; }
}
Héctor M.
  • 2,302
  • 4
  • 17
  • 35