0

I'm trying to load a lot of images which are listed in a CVS file in the following format:

./path/to/img1.ext;label1
./path/to/img2.ext;label2

This is the script I've written:

var cv = require("opencv"),
    fs = require("fs"),
    console = require("console"),
    util = require("util"),
    lazy = require("lazy.js");

var basePath = '/some/path/';

var csvFile = fs.createReadStream(basePath + 'db.csv', {flags:'r'});

var images = [],
    labels = [];

lazy(csvFile)
.lines()
.each(function(l) {
    var d = lazy(l).split(';').toArray();
    cv.readImage(basePath + d[0], function(e, m) {
        images.push(m);
    });
    labels.push(d[1]);
});

console.log(util.inspect(images));
console.log(util.inspect(labels));

It prints two line containing the representation of an empty array [].

The images are actually getting loaded by OpenCV, because if you try to print m before pushing it into the array it correctly prints [Matrix HxW ], where H and W stand for the height and the width of the images.

EDIT: also, can you think of a better way than 2 separated arrays for keeping each image associated with its label?

EDIT: the problem seems to be that the images are loaded asynchronously. So the problem is my lack of experience with asynchronous programming. How can I make this work?

rrrrrrrrrrrrrrrr
  • 344
  • 5
  • 16
  • have you tried this library for node? https://github.com/caolan/async – gabereal Sep 18 '14 at 19:04
  • @gabereal: how should that help me? – rrrrrrrrrrrrrrrr Sep 18 '14 at 19:09
  • you can delay the iteration of the each loop until the readImage (which i'm assuming from your second edit is the asynchronous part) callback finishes executing. is there any good reason to use lazy? i don't see why you don't just use 'fs' and 'readline' node modules... – gabereal Sep 18 '14 at 20:09
  • see http://stackoverflow.com/questions/6156501/read-a-file-one-line-at-a-time-in-node-js/#answer-15554600 – gabereal Sep 18 '14 at 20:11

1 Answers1

0

here's a solution using piping and the csv2 and through2 libraries for node which you can find here https://github.com/rvagg/csv2, and here https://github.com/rvagg/through2

i tested this with a simulated asynchronous function using setTimeout and it worked. however, because i don't have your data file i can't test it exactly. please let me know if there is a problem.

NOTE i created an array of objects. each object has the image and its label. i think this is a better solution than trying to keep two arrays with those associations. in general if you need to have relationships with your data, an object will be better than two arrays :)

var fs = require('fs');
var files = [];
var file = fs.createReadStream('test.txt');
var csv2 = require('csv2');
var th2 = require('through2');
var cv = require('opencv');

file
.pipe(csv2({'separator': ';'})).pipe(th2({objectMode: true},function(parsedLine, enc, callback){
    var me = this;
    cv.readImage(parsedLine[0], function(e, img) {
        files.push({image: img, label: parsedLine[1]});
        me.push(parsedLine);
        callback();
    });
}))
.on('data', function(data){/*do something with data if you want to*/})
.on('end', function(){console.log(files);});
gabereal
  • 304
  • 3
  • 11