6

Say my company serves a large log file (4+ GB), where the most recent logs are at the top. I want to build a webpage to search that file for a keyword "Mike". Bandwidth is not a restriction, but this webpage can only be static files (i.e. no server-side functionality).

Example log file:

Joe completed Task 1234 on 2013-10-10
Joe completed Task 1235 on 2013-10-11
Mike completed Task 1236 on 2013-10-11
Joe completed Task 1237 on 2013-10-13
...

Obviously, I can't put the entire file into memory in the browser, so I'm trying to find a way to request the file, search the data as it gets downloaded, then throw away non-relevant data to save memory. I am using the xhr.onprogress event to get the partially downloaded log file via xhr.responseText and search that, but I can't reset the responseText after I'm done reading it.

Here's my algorithm so far:

var xhr = new XMLHttpRequest();
xhr.onprogress = function(e){
    var cur_len = xhr.responseText.length;
    var found_mike = xhr.responseText.indexOf("Mike") != -1 ? true : false;
    xhr.responseText = ""; //clear responseText to save memory
    console.log("%d - %s - %d", cur_len, found_mike, xhr.responseText.length);
};
xhr.open("get", "mylogfile.txt", true);
xhr.send();

I would expect the console to say something like 234343 - false - 0, but instead I get 234343 - false - 234343, and the browser runs out of memory (since responseText isn't being cleared).

Is there a way I can discard the responseText so that the browser can download and process a file without holding the entire file in memory?

EDIT: Also, if responseText is read-only, why doesn't it throw an error/warning?

user749618
  • 1,170
  • 2
  • 11
  • 19
  • I'm pretty sure the response isn't available until the request completes. – Musa Nov 18 '13 at 21:44
  • [xhr.response](http://www.w3.org/TR/XMLHttpRequest/#the-response-attribute) isn't available, but [xhr.responseText](http://www.w3.org/TR/XMLHttpRequest/#the-responsetext-attribute) is available during LOADING. – user749618 Nov 18 '13 at 23:12

1 Answers1

8

After asking a friend, and he had a great answer: Range headers (stackoverflow question, jsfiddle)

var chunk_size = 100000; //100kb chunks
var regexp = /Mike/g;
var mikes = [];
function next_chunk(pos, file_len){
    if(pos > file_len){
        return;
    }
    var chunk_end = pos + chunk_size < file_len ? pos + chunk_size : file_len;
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function(){
        if(xhr.readyState == 4 && xhr.status == 206){
            //push mikes to result
            while ((match = regexp.exec(xhr.responseText)) != null) {
                mikes.push(pos + match.index);
            }
            //request next chunk
            file_len = parseInt(xhr.getResponseHeader("Content-Range").split("/")[1]);
            next_chunk(chunk_end + 1, file_len);
        }
    };
    xhr.open("get", "mylogfile.txt", true);
    xhr.setRequestHeader("Range", "bytes=" + pos + "-" + chunk_end);
    xhr.send();
}
next_chunk(0, chunk_size);
Community
  • 1
  • 1
user749618
  • 1,170
  • 2
  • 11
  • 19
  • You should probably stay away from base 10 if you want better performance, my recommendation for chunk size would be 134217728 byte chunks=128Mb chunks if you want to just search the string, or 32Mb=33554432 byte chunks if you want to do more advanced cpu-intensive operations. The reason for why I would suggest such small constraints is because IE really is quite a bit of a memory hog. – Jack G Dec 06 '16 at 02:48
  • This is a doable approach. However keep in mind that you require the server to allow ranges in requests by having `Accept-Ranges: `. [mdn](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Ranges) – lllllll Apr 04 '17 at 19:16