0

Please excuse the rookie question as I'm not a programmer :)

We're using Pentaho 8

I'm looking for a way to have Javascript or Java read a directory and return the file names of any files that are older than a date that will be provided by a Pentaho parameter.

Here is what I currently have using a Modified Java Script Value step that only lists the directory contents:

var _getAllFilesFromFolder = function(dir) {

var filesystem = require("fs");
var results = [];

filesystem.readdirSync(dir).forEach(function(file) {

    file = dir+'\'+file;
    var stat = filesystem.statSync(file);

    if (stat && stat.isDirectory()) {
        results = results.concat(_getAllFilesFromFolder(file))
    } else results.push(file);

});

return results;

};

Is Javascript/Java the right way to do this?

jdids
  • 561
  • 1
  • 7
  • 22
  • Both could work. *provided by a Pentaho parameter* how do you access it/how does it get passed in? Pentaho is java world, as far as I know? – Curiosa Globunznik Dec 03 '19 at 16:57
  • It's passed by a Table Input step. – jdids Dec 03 '19 at 16:59
  • It'll be used to pass the file names to another step – jdids Dec 03 '19 at 17:05
  • Ok, means your function is also a step. Since you started with javascript, the docs say, you can [do so](https://help.pentaho.com/Documentation/8.2/Products/Data_Integration/Transformation_Step_Reference/Modified_Java_Script_Value), js file scanning is presented e.g. [here](https://stackoverflow.com/questions/31274329/get-list-of-filenames-in-folder-with-javascript) – Curiosa Globunznik Dec 03 '19 at 17:08
  • I didn't quite get it ... the file list input, it's on a C:/.. directory correct ? If so you don't know need Javascript or Java at all. – Cristian Curti Dec 03 '19 at 19:13
  • Reason I'm wanting to do it this way is for performance reasons as there will be many files in this directory and I want to trim down things. I have something written in Powershell that does this but Pentaho isn't receiving the Powershell results as expected. – jdids Dec 03 '19 at 19:32
  • Have you tried using Built in steps ? Get file names > Filters ? – Cristian Curti Dec 03 '19 at 19:33
  • I wanted to avoid this for performance reason as there will be a lot of files it will need to filter through. – jdids Dec 04 '19 at 17:25

2 Answers2

1

There's a step called "Get file names". You just need to provide the path you want to poll. It also allows doing so recursively, only showing filenames that match a given filter, and in the filters tab allow you to show only folders, only files, or both.

nsousa
  • 4,448
  • 1
  • 10
  • 15
  • This is what I'm trying to avoid because there is too many files that will be in this directory. I'm trying to make sure the process is as light as possible. Thanks for the recommendation – jdids Dec 04 '19 at 17:23
0

nsousa's answer would be the easiest, then after you get your file list you can use a filter rows step on the lastmodifiedtime returned from the Get file names. 2 -steps, 3 if you want to format the date/time returned to something easier to sort/filter through. This is the approach I use and its is faster then the transformations can keep up with generally.

mxdog
  • 45
  • 5