0

I have one doubt because I need to read a local file and I have been studying some threads, and I have seen various ways to handle it, in most of the cases there is an input file.

I would need to load it directly through code.

I have studied this thread:

How to read a local text file?

And I could read it.

The surprising part was when I tried to split the lines and words, it showed: � replacing accent letters.

The code I have right now is:

myFileReader.js

function readTextFile(file) {

    var rawFile = new XMLHttpRequest();
    rawFile.open("GET", file, false);
    rawFile.onreadystatechange = function () {
        if (rawFile.readyState === 4) {
            if (rawFile.status === 200 || rawFile.status == 0) {
                allText = rawFile.responseText;
                console.log('The complete text is', allText);
                let lineArr = intoLines(allText);
                let firstLineWords = intoWords(lineArr[0]);
                let secondLineWords = intoWords(lineArr[1]);

                console.log('Our  first line is: ', lineArr[0]);

                let atlas = {};
                for (let i = 0; i < firstLineWords.length; i++) {
                    console.log(`Our ${i} word in the first line is : ${firstLineWords[i]}`);
                    console.log(`Our ${i} word in the SECOND line is : ${secondLineWords[i]}`);
                    atlas[firstLineWords[i]] = secondLineWords[i];
                }
                console.log('The atlas is: ', atlas);
                let atlasJson = JSON.stringify(atlas);
                console.log('Atlas as json is: ', atlasJson);

                download(atlasJson, 'atlasJson.txt', 'text/plain');
            }
        }
    };
    rawFile.send(null);
}

function download(text, name, type) {

    var a = document.getElementById("a");
    var file = new Blob([text], {type: type});
    a.href = URL.createObjectURL(file);
    a.download = name;
}

function intoLines(text) {
    // splitting all text data into array "\n" is splitting data from each new line
    //and saving each new line as each element*

    var lineArr = text.split('\n');

    //just to check if it works output lineArr[index] as below


    return lineArr;


}

function intoWords(lines) {


    var wordsArr = lines.split('" "');


    return wordsArr;

}

The doubt is: how could we handle those special character which are the vowels with accent?

I ask this, because even in the IDE thet interrogation marks appeared if we load the txt in UTF-8, so then I changed to ISO-8859-1 and it loaded well.

Also I have studied:

Read UTF-8 special chars from external file using Javascript

Convert special characters to HTML in Javascript

Reading a local text file from a local javascript file?

In addition, could you explain if there is a shorter way to load files in client javascript. For example in Java there is the FileReader / FileWriter / BufferedWriter. Is theren in Javascript something similar?

Thank you for you help!

TobiSH
  • 2,833
  • 3
  • 23
  • 33
Yone
  • 2,064
  • 5
  • 25
  • 56
  • *"In addition..."* On SO, it's important to ask ask **one** question/question, not two (or more). (I was about to link to the help page that says that and...I'm not finding one. Which is a problem with the help. :-) ) – T.J. Crowder Mar 24 '18 at 18:30
  • The first step in accepting a text file is knowing which character encoding it uses. – Tom Blodget Mar 24 '18 at 22:28

2 Answers2

2

It sounds like the file is encoded with ISO-8859-1 (or possibly the very-similar Windows-1252).

There's no BOM or equivalent for those encodings.

The only solutions I can see are:

  1. Use a (local) server and have it return the HTTP Content-Type header with the encoding identified as a charset, e.g. Content-Type: text/plain; encoding=ISO-8859-1

  2. Use UTF-8 instead (e.g., open the file in an editor as ISO-8859-1, then save it as UTF-8 instead), as that's the default encoding for XHR response bodies.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
0
  1. Put your text in an .html file with the corresponding content type, for example:

    <meta http-equiv="Content-Type" content="text/html; charset="UTF-8">
    

    enclose the text between two tags ("####" in my example) (or put in a div)

  2. Read the html page, extract the content and select the text:

     window.open(url); //..
     var content = newWindow.document.body.innerHTML;
     var strSep="####";
     var x = content.indexOf(strSep);
     x=x+strSep.length;    
     var y = content.lastIndexOf(strSep); 
     var points=content.slice(x, y);
    
x00
  • 13,643
  • 3
  • 16
  • 40