2

How do I read the first non-empty line from a text file, in plain javascript, possibly by using a new FileReader()?

I wish to get only the first non-empty line, not the whole text in the file (which might be enormous). By "non-empty line" I mean a line ending with \r\n and containing some non-blank char.

One of my practical goals could be to pick the first "good" line in a huge CSV file, to possibly get the "headers" (names of variables of the dataset).

The file I wish to read is now locally on my hard disk, but when the script is ready, I would also like to put it online on my domain along with the script to process it.

Thank you!

  • 1
    What form do you have the file in? Is it a `File` object, or are you retrieving it via `fetch` or something? They have different methods of getting the file content as text, which is what you'll want to do, then split the string on line breaks and then walk through the array of lines until you find a line that matches your criteria. – Mark Hanna Oct 08 '21 at 00:45
  • 1
    @Mark Hanna I have it in the form of a huge CSV file containing data. I wish to get only the headers' line. Reading the entire dataset would be useless at this stage. I would like some functionality to read 1 line at a time from a file, without loading it all in memory. The file is on my hard disk, but later I will wish to put it online with the JS script and be able to execute the same task. –  Oct 08 '21 at 00:49
  • Hmm, I don't know of any way to do that in JavaScript. You may need some custom back end code that can deal more directly with the file system, and have read only the first part of your file and then perhaps let that be requested via an API. I'm not sure how much control you have over your back end system though. – Mark Hanna Oct 08 '21 at 00:51
  • 1
    @Mark Hanna I think I will have full control, because the data file is either on my PC, or I will put it online with the JS script to read it, on my own domain. I guess I would need some sort of reaLine() functionality, which is usually in any textFileReader of various languages. –  Oct 08 '21 at 00:53

2 Answers2

0

I think you want something kinda like this:

fs = require('fs')
fs.readFile('/testFile.csv', 'utf8', function (err,data) {
  if (err) {
    return console.log(err);
  }
    var firstLineRegEx = /^(.*)$/m
    var csvHeader = firstLineRegEx.exec(data)
});
  • Thank you. Is this javascript ? what is require() ? I want to make sure I am not loading the entire monster file, but just one line (or the smallest possible chunk of data, anyway) –  Oct 08 '21 at 01:54
  • @James You should probably mention that this is using Node.js – solarshado Oct 08 '21 at 02:04
0

Assuming you want to stick to client-side/in-browser JavaScript (as opposed to node.js), there's not really a way to open a file and read it line-by-line; that would require a level of direct filesystem access that's simply not available. You can certainly process it line-by-line, once it's fully-loaded and you've split it into lines.

If you want to be sure that you only ever load the first line in the browser, you'll need some server-side code for that. Node.js may or may not be a good option, depending on your existing web site.

The FileReader API you mentioned looks like it's geared towards handling arbitrary, user-provided files, which is probably the wrong direction if you'll always be in control of what file to load.

You've actually got two, distinct problems here: how do I load only part of a file in client-side JS, and how can I find the header line in a string containing CSV formatted data? (You could probably break this down differently, but it's multiple parts regardless.)

solarshado
  • 567
  • 5
  • 18
  • Thank you. I am trying to understand reading around on the web too. Is it ever possible that a solution lies in using FileReader.readAsArrayBuffer(file) and then iterate through each character to find an end of line ? Would this allow loading only a chunk of the file? –  Oct 08 '21 at 02:46
  • I see a couple problems with that approach: First, according to MDN, all the `FileReader.readAs()` methods only set `result` "once finished", which presumably means "after processing/loading the entire input". Second, while it looks like you *could* loop through an `ArrayBuffer` like that, I've not seen anything that implies the entire buffer isn't loaded into memory, *and* it's byte-wise view of the data seems like it would only make processing it more difficult than necessary (even before considering the possibility of multi-byte unicode characters). – solarshado Oct 08 '21 at 03:09
  • I am now reading this article: https://gist.github.com/afreeland/8184438 which seems actually to yield the correct result. However, I am not sure of the author claim that the entire file is not loaded. He says "Convert entire ArrayBuffer to string --avoided so not all of ArrayBuffer would have to come into memory". Is this really so? If he is right this could actually be a simple possible solution. Do you see problems? Why the file is not loaded entirely by using "reader.readAsArrayBuffer"? –  Oct 08 '21 at 03:24
  • I'm inclined to trust (my interpretation of) the MDN documentation over comments in a random gist. To be blunt, I think the author is just wrong about what's happening "under the hood" of their code. It *does* look like it avoids converting the entire stream of bytes into a string, but that, alone, doesn't seem particularly useful. That said, I don't doubt that it successfully returns the data it's supposed to; but you could just use `readAsText()` and process the resulting string. – solarshado Oct 08 '21 at 03:59
  • Yes, I agree with you. The code does the job indeed for what I could test. It remains to establish with more certainty whether or not it loads only a chunk (like 4-8K) or the whole thing. If it does not, the code becomes useless and even dangerous. I will go and read the documentation that I can find, and in case I will later report here. If you guys have some info or a complete solution, of course, feel free to post your answers. Coming from other languages and being, for now, an enthusiast of JS, I would not feel good not finding some reasonable solution to such a basic task. Thanks a lot! –  Oct 08 '21 at 06:40
  • I am actually debugging the program, and at the line byteLength = data.byteLength; the debugger says bytelength: 8738 data: ArrayBuffer(8738). This seems to indicate that chunks of around 8K are being read, and the author may be right. Am I interpreting wrong? –  Oct 08 '21 at 07:08
  • I found another one, more complete, which seems based on the same concept: https://gist.github.com/peteroupc/b79a42fffe07c2a87c28 –  Oct 08 '21 at 08:22