10

I need to extract just the header from a remote csv file.

My current method is as follows:

Papa parse has a method to stream data and look at each row individually which is great, and I can terminate the stream using parser.abort() to prevent it going any further after the first row, this looks as follows:

Papa.parse(csv_file_and_path,{header:true, worker:true, 
    download: true,
    step: function(row, parser) 
    {
        //DO MY STUFF HERE
        parser.abort();
    }
});

This works fine, but because I am using a remote file, it has to download the data in order to read it. Even though the code releases control back to the browser after the first line has been parsed, the download continues long after the parsing has found the first row and given me the information I need, particularly for large files where the download can continue for a long time after I've got what I need.

Is there a more efficient way of doing this? Can I prevent papa parse from downloading the whole file?

I have tried using

Papa.parse(csv_file,{header:true,
download: true,
preview:1,
complete: function(results){
    //DO MY STUFF HERE
}
});

But this does the same thing, it downloads the entire file, but as with the first approach gives back control to the browser after the first line is parsed.

Single Entity
  • 2,925
  • 3
  • 37
  • 66

2 Answers2

4

You can use the preview option of PapaParse:

 Papa.parse(..., {
          preview: 5, ...

Also read this: https://github.com/mholt/PapaParse/issues/47

Related topic: Javascript using File.Reader() to read line by line

Community
  • 1
  • 1
Christophe Roussy
  • 16,299
  • 4
  • 85
  • 85
  • The preview method doesn't work, I should have mentioned this before as it had already been tested, I'll update my question. Preview doesn't seem to stop it downloading the entire file, which it should, but I've tested it and it doesn't. – Single Entity Mar 29 '17 at 07:48
  • It works for me, maybe we use a different version, try with latest one. It clearly freezes the browser on large CSV files for me, but not with preview. Also note that I used Firefox to test this. – Christophe Roussy Mar 29 '17 at 08:44
  • Yeah I tested it myself today, I find it works in the sense that it frees the browser up as my original method does, but it doesn't seem to stop the download, which continues in the background afterwards. Have you monitored your network usage to see the size of the file retrieved? In mine it downloads the entire file still in the background. – Single Entity Mar 29 '17 at 08:46
  • I did not check the network, but I will certainly have a look now that you mention it. – Christophe Roussy Mar 29 '17 at 08:47
  • Ok great, let me know what you find. – Single Entity Mar 29 '17 at 08:49
  • 1
    @SingleEntity I actually used local file select, which is another use case. – Christophe Roussy Mar 30 '17 at 10:18
4

The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.

Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.

Papa.parse(csv_file,{header:true,
    download: true,
    step: function(results, parser) {

        //DO MY THING HERE

        parser.abort(); 
        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }, complete: function(results){

        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }
});
Single Entity
  • 2,925
  • 3
  • 37
  • 66