2

Almost 6 months ago, I asked a question on stackoverflow "Software to help in log analysis?"

Please look at that question before reading ahead.

It turned out that there is no good software available currently that can intermix log files based on timestamps and present them in a good UI.

I wanted to take initiative and develop something and open source it once its completed.

Earlier, I worked around by writing a quick and dirty piece of code in c++, that would generate a tab separated file (like csv but tab separated) which I would later opened in Excel.

I am not satisfied with my c++ code for the following reasons: 1. It totally depends on Excel to view the output file later. 2. Since there is no UI involved, its not easy to write its commandline everytime. 3. Because of the learning curve of the commandline, its not so much sharable with other team members (and the world).

For the above reasons (and a few more), I was thinking to develop that as a web solution. That way I can share the working instance with everyone.

What I have in mind is a web based solution something like this:

  • The user will be able to give the input log files using HTML5's File API.
  • And then user would probably tell the format of the timestamp associated with each log file.
  • Thereafter, the javascript would process those log files into intermixed HTML output in a table.

I am just a beginner in web based technologies. So I need your help in determining if this would be the best way to go about it?

I want a web solution, but that doesn't mean I want user to upload his log files for backend processing. I want a web-based client only solution.

Thanks for your inputs.

EDIT: Based on comment below by Raynos

@bits You do realise that browsers were not meant to handle large pieces of data. There was stackoverflow.com/questions/4833480/… which shows that this can cause problems.

I feel that doing this in browsers isn't the best deal. Probably, I should explore backend based solutions. Any ideas or suggestions?

Community
  • 1
  • 1
bits
  • 8,110
  • 8
  • 46
  • 55
  • How big are the files? At some point, the browser will be unable to process it with just Javascript. – Srdjan Pejic Mar 08 '11 at 20:21
  • The files can be as big as 50 mb. Yup... maybe you are right... – bits Mar 08 '11 at 21:14
  • @bits woh. Your going to need to stress test that. – Raynos Mar 08 '11 at 21:17
  • But I guess browsers can handle gigs of memory these days. – bits Mar 08 '11 at 21:23
  • @Raynos Actually, Files could be even 100 mb each. And there could be lets say 8 files as input. But thats just 800 mb. For a computer having 1 GB of spare memory shouldn't have an issue. The only thing is that I do not want to maintain a datastructure. It should be like a merge procedure of sorted arrays. Output log statements in final table as you go. Something like that. – bits Mar 08 '11 at 21:26
  • @bits You do realise that browsers were not meant to handle large pieces of data. There was http://stackoverflow.com/questions/4833480/is-this-asking-too-much-of-a-browser which shows that this can cause problems. – Raynos Mar 08 '11 at 21:29
  • @bit took me about [10s to load 25mb](http://jsfiddle.net/Raynos/Vbmm9/3/) of text into the DOM from the file API. – Raynos Mar 08 '11 at 21:41
  • Have you heard of Splunk? http://www.splunk.com/ – Michael Mior Mar 08 '11 at 21:43

2 Answers2

3

Your looking for an online diff tool which takes n files containing a list of timestamps in some order including a extra data to be displayed in place but not parsed in the diffing.

The file upload would involve

<input id="upload" type="file">

Along with snippets of javascript

$("#upload").change(function(files) {
    var files = this.files;
    for (var i = 0; i < files.length; i++) {
        (function() {
            var file = files[i]; 
            var reader = new FileReader;
            reader.onload = function(e) {
                var text = reader.result;
                console.log(text);
            };
            reader.readAsText(file);
        }());
    }
});

See live example.

So you have all the text you just need to work on a parser. I hope that helps a bit.

As for the markup of the diff I would suggest something like:

<table>
 <!-- one tr per unique timestamp -->
 <tr>
  <!-- one td/textarea per file -->
  <td> <textarea /> </td>
  <td> <textarea /> </td>
 </tr>
 ...
</table>

I would recommend making this a template and using a template engine to do some of the heavy lifting.

Let's say we want to use jquery-tmpl.

Here's an example to get you started. (I spend zero time on making it look good. That's your job).

All that's left is generating JSON data to insert into the template.

So given your file input you should have an array of fileTexts somewhere.

We want to have some kind of deliminator to split it up into individual time stamp records. For simplicities sake let's say that the new line character would work.

var fileTexts = [file];
var regex = new RegExp("(timestampformat)(.*)");

for (var i = 0; i < fileTexts.length; i++) {
    var text = fileTexts[i];
    var records = text.split("\n");
    for (var j = 0; j < records.length; j++) {
        var match = regex.exec(records[j]);
        addToTimestamps(match[1], match[2], i);
    }
}

function addToTimestamps(timestamp, text, currFileCount) {
    console.log(arguments);
    // implement it.
}

As per example.

These are the basic building blocks. Get the data from the File API. Manipulate the data into a normalised data format then use some kind of rendering tool on the data format.

Raynos
  • 166,823
  • 56
  • 351
  • 396
  • What you are suggesting is that I read one file at a time. Wouldn't it be faster if I would read all the files simultaneously. Imaging a merge algorithm on sorted arrays. Couldn't we do this in similar fashion? Merging files 1 line at a time based on timestamps? – bits Mar 08 '11 at 21:19
  • @bits your going to have to iterate over every line in every file. Whether you do that one at a time in sequence or in parallel doesn't matter too much. Depends whether you want the merging to be done in the normalisation of data or in the rendering of data. Doing them in parallel is perfectly possible. Once you have all the text out of the files you can do what you want – Raynos Mar 08 '11 at 21:24
  • Does that mean that the content of the files would be in memory before I actually start the real processing? I was imagining a `process as you read each line` solution. That was what I implemented in c++. – bits Mar 08 '11 at 21:29
  • @bits that's not possible with the HTML5 file API. It returns your entire file as a string. There is no `readline` in the file API. – Raynos Mar 08 '11 at 21:31
  • @raynos I have edited my question. PLease take a look. I think I should consider backend based solutions. Any ideas? Or would you suggest that its not worth it to make a web based solution. – bits Mar 08 '11 at 21:34
  • @bits You should do stress testing. Upload a 100mb file. Print it's content to the DOM. See what happens. There are a few methods which are not really implemented at all cross-browser. Such as [`Blob.slice`](https://developer.mozilla.org/en/DOM/Blob#slice()) and [`reader.onprogress`](http://www.w3.org/TR/FileAPI/#dfn-progress-event) when used with [`reader.readAsArrayBuffer`](https://developer.mozilla.org/en/DOM/FileReader#readAsArrayBuffer()). These would allow some kind of `.readLine` functionality. – Raynos Mar 08 '11 at 21:40
  • Raynos, I think that you are wrong here, the html5 api provides the file as a dataURL - https://developer.mozilla.org/en/using_files_from_web_applications#Example.3a.c2.a0Showing_thumbnails_of_user-selected_images – idbentley Mar 08 '11 at 21:40
  • @idbentley how does a dataURL help implementing read by line functionality. You can indeed get at it piece by piece but you don't have a reader that you can just call `.nextchar` on or something similar. – Raynos Mar 08 '11 at 21:45
  • @Raynos - ah, I see. I was mistaken, I thought that the FileReader provided a richer api. Yeah, that sucks. – idbentley Mar 08 '11 at 21:53
  • @idbentley the `readAsArrayBuffer` fires periodic `progress` events. For some browser specific periods. You can also call `.slice(startInt, lenghtInt)` on the data. But where do you start? How many characters do you want? – Raynos Mar 08 '11 at 21:59
2

This would be fairly easy to do using javascript. You mentioned above using the html5 file api, and that is a great place to start (html file api), you can use unobtrusive javascript to have a callback fire when the a file is uploaded. Inside the callback you could use any of the great javascript templating libraries to construct a table of elements from the uploaded file. Then on subsequent file uploads, you could dynamically interleave them into the table using their timestamp. Detecting the timestamp using js regular expressions would be fairly straightforward, and reasonably efficient if you used a compiled form.

This is a fairly high level answer to the question, and if you have any questions about particular details, I'd be happy to answer those as well.

Hope this helps

idbentley
  • 4,188
  • 3
  • 34
  • 50
  • Just as I commented on Raynos's answer, I wanted to confirm this: I am not sure if it is possible to simultaneously read more than 1 files at a time and process them line by line based on timestamp. Imagine a merge algorithm used to merge 2 sorted arrays. Pretty much I plan for the same algo. But I am not sure if HTML5 File API would allow that. Thanks for confirming if that is possible or not. – bits Mar 08 '11 at 21:22
  • That doesn't really have anything to do with the html5 file api, rather it would be dependent on the Javascript library you used to parse the file handlers. The html5 api just gives you the file handlers, then you use some JS library to parse them - if this library allows concurrent parsing, then you could definitely write such an algorithm. For example, you may only want to initially show the 100 earliest items (no matter which file they were in) - in this case, you would have to be more clever. Generally, start simply and only complicate when necessary. – idbentley Mar 08 '11 at 21:39