I am processing a folder of 400+ xml files, converting / reducing them to a subset as JSON and then trying to insert MongoDB one JSON file at a time. (the files are too big to push into one large JSON file and simply do mongoimport)
The following code works ok for a path/folder with one xml file only. Except the filename but I can fix that (I think)
The problem is it can only handle one file, which defeats the object. I'm not sure if the issue is my inexperience with node.js style coding... or something MongoDB does allowing the file looping process to continue inserting before the first insert completed.
var fs = require('fs'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
fs.readdir('/Users/urfx/data', function(err, files) {
files.filter(function(file) { return file.substr(-4) == '.xml' })
.forEach(function(file) {
fs.readFile(file, function(err, data) {
// parse some xml files and return reduced set of JSON data (works)
parser.parseString(data, function (err, result) {
var stuff = [inspectFile(result)];
var json = JSON.stringify(stuff); //returns a string containing the JSON structure by default
//make a file copy of the transformed data
fs.writeFile(file+'_establishments.json', json, function (err) {
if (err) throw err;
console.log('file saved!');
// write to mongoDB collection
fs.readFile(file+'_establishments.json', function(err, data) {
mongoInsert(data);
});
});
});
});
});
});
help! I'm going loopy on this one... It bombs on more than one file. Perhaps the problem is mongodb is still processing the first json array then the second one kicks off.
following the pointers from tandrewnichols I made these improvements. I then faced data errors (perhaps I always did). It does look like a mongo issue because if all the json files import ok one by one... I'm out of time and can't get to the bottom of it because of the individual .json files being too large to visually compare and too different to diff ;)
So I modified the purpose of this routine just to spit out the .json files (//commented out the line that writes to mongo) then I ran a simple shell script to use mongoimport I'll append that here also. That got me where I needed to get.
All things (.json files) being equal the changes below now work so thanks again tandrewnichols.
my solution uses .fs serial loop as opposed to parallel loop (see my comments)
fs.readdir(path, function(err, files) {
files = files.filter(function(file) { return file.substr(-4) == '.xml' })
var i = 0;
(function next() {
var file = files[i++];
if (!file) return console.log(null, "end of dir");
file = path+file;
fs.readFile(file, function(err, data) {
// parse some xml files and return reduced set of JSON data (works)
parser.parseString(data, function (err, result) {
console.log("3. result = "+result);
var stuff = xmlToJSON(result);
var json = JSON.stringify(stuff); //returns a string containing the JSON structure by default
//make a file copy of the transformed data
var fileName = file.replace('.xml', '_establishments.json');
fs.writeFile(fileName, json, function (err) {
if (err) throw err;
console.log(fileName+' saved!'); // thanks to tandrewnichols
});
mongoInsert(stuff); // turns out I have some voodoo in json file output
next();
});
});
})();
});
here is the shell script.
for i in *.json; do
mongoimport -d db_name_here -c collection_name_here --type json --file "$i" --jsonArray
done