2

I have a large json file that looks like that:

[
 {"name": "item1"},
 {"name": "item2"},
 {"name": "item3"}
]

I want to stream this file (pretty easy so far), for each line run a asynchronous function (that returns a promise) upon the resolve/reject call edit this line.

The result of the input file could be:

[
 {"name": "item1", "response": 200},
 {"name": "item2", "response": 404},
 {"name": "item3"} // not processed yet
]

I do not wish to create another file, I want to edit on the fly the SAME FILE (if possible!).

Thanks :)

Shprink
  • 780
  • 8
  • 16

3 Answers3

2

I don't really answer the question, but don't think it can be answered in a satisfactory way anyway, so here are my 2 cents.

I assume that you know how to stream line by line, and run the function, and that the only problem you have is editing the file that you are reading from.

Consequences of inserting

It is not possible to natively insert data into any file (which is what you want to do by changing the JSON live). A file can only grow up at its end.

So inserting 10 bytes of data at the beginning of a 1GB file means that you need to write 1GB to the disk (to move all the data 10 bytes further).

Your filesystem does not understand JSON, and just sees that you are inserting bytes in the middle of a big file so this is going to be very slow.

So, yes it is possible to do. Write a wrapper over the file API in NodeJS with an insert() method.

Then write some more code to be able to know where to insert bytes into a JSON file without loading the whole file and not producing invalid JSON at the end.

Now I would not recommend it :)

=> Read this question: Is it possible to prepend data to an file without rewriting?

Why do it then?

I assume that want to either

  • Be able to kill your process at any time, and easily resume work by reading the file again.
  • Retry partially treated files to fill only the missing bits.

First solution: Use a database

Abstracting the work that needs to be done to live edit files at random places is the sole purpose of existence of databases.

They all exist only to abstract the magic that is behind UPDATE mytable SET name = 'a_longer_name_that_the_name_that_was_there_before' where name = 'short_name'.

Have a look at LevelUP/Down, sqlite, etc...

They will abstract all the magic that needs to be done in your JSON file!

Second solution: Use multiple files

When you stream your file, write two new files!

  • One that contain current position in the input file and lines that need to be retried
  • The other one the expected result.

You will also be able to kill your process at any time and restart

Community
  • 1
  • 1
Eloims
  • 5,106
  • 4
  • 25
  • 41
1

According to this answer writing to the same file while reading is not reliable. As a commenter there says, better to write to a temporary file, and then delete the original and rename the temp file over it.

To create a stream of lines you can use byline. Then for each line, apply some operation and pipe it out to the output file.

Something like this:

var fs = require('fs');
var stream = require('stream');
var util = require('util');
var LineStream = require('byline').LineStream;

function Modify(options) {
    stream.Transform.call(this, options);
}
util.inherits(Modify, stream.Transform);

Modify.prototype._transform = function(chunk, encoding, done) {
    var self = this;
    setTimeout(function() {
        // your modifications here, note that the exact regex depends on 
        // your json format and is probably the most brittle part of this
        var modifiedChunk = chunk.toString();
        if (modifiedChunk.search('response:[^,}]+') === -1) {
            modifiedChunk = modifiedChunk
                .replace('}', ', response: ' + new Date().getTime() + '}') + '\n';
        }      
        self.push(modifiedChunk);
        done();
    }, Math.random() * 2000 + 1000); // to simulate an async modification
};

var inPath = './data.json';
var outPath = './out.txt';
fs.createReadStream(inPath)
    .pipe(new LineStream())
    .pipe(new Modify())
    .pipe(fs.createWriteStream(outPath))
    .on('close', function() {
        // replace input with output
        fs.unlink(inPath, function() {
           fs.rename(outPath, inPath);
        });
    });

Note that the above results in only one async operation happening at a time. You could also save the modifications to an array and once all of them are done write the lines from the array to a file, like this:

var fs = require('fs');
var stream = require('stream');
var LineStream = require('byline').LineStream;

var modifiedLines = [];
var modifiedCount = 0;
var inPath = './data.json';
var allModified = new Promise(function(resolve, reject) {

    fs.createReadStream(inPath).pipe(new LineStream()).on('data', function(chunk) {
       modifiedLines.length++;
       var index = modifiedLines.length - 1;
       setTimeout(function() {
           // your modifications here
           var modifiedChunk = chunk.toString();
           if (modifiedChunk.search('response:[^,}]+') === -1) {
               modifiedChunk = modifiedChunk
                   .replace('}', ', response: ' + new Date().getTime() + '}');
           }                      
           modifiedLines[index] = modifiedChunk;
           modifiedCount++;
           if (modifiedCount === modifiedLines.length) {
              resolve();
           }
       }, Math.random() * 2000 + 1000);
    });

}).then(function() {
    fs.writeFile(inPath, modifiedLines.join('\n'));
}).catch(function(reason) {
    console.error(reason);
});

If instead of lines you wish to stream chunks of valid json which would be a more robust approach, take a look at JSONStream.

Community
  • 1
  • 1
ekuusela
  • 5,034
  • 1
  • 25
  • 43
0

As mentioned in the comment, the file you have is not proper JSON, although is valid in Javascript. In order to generate proper JSON, JSON.stringify() could be used. I think it would make life difficult for others to parse nonstandard JSON as well, therefore I would recommend furnishing a new output file instead of keeping the original one.

However, it is still possible to parse the original file as JSON. This is possible via eval('(' + procline + ')');, however it is not secure to take external data into node.js like this.

const fs = require('fs');
const readline = require('readline');
const fr = fs.createReadStream('file1');
const rl = readline.createInterface({
    input: fr
});


rl.on('line', function (line) {
    if (line.match(new RegExp("\{name"))) {
        var procline = "";
        if (line.trim().split('').pop() === ','){
            procline = line.trim().substring(0,line.trim().length-1);
        }
        else{
            procline = line.trim();
        }
        var lineObj = eval('(' + procline + ')');
        lineObj.response = 200;
        console.log(JSON.stringify(lineObj));
    }
});

The output would be like this:

{"name":"item1","response":200}
{"name":"item2","response":200}
{"name":"item3","response":200}

Which is line-delimited JSON (LDJSON) and could be useful for streaming stuff, without the need for leading and trailing [, ], or ,. There is an ldjson-stream package for it as well.

mcku
  • 1,351
  • 12
  • 23