0

I am working with some xml files stored on S3. I use the xml2js module in Node to parse the xmls and then I extract the strings that have .jpg in them. I was using the filter method and then tried using my own for loop but that didn't shave any time off. Is there any faster way to write this section of the code or is this the fastest way to get it done. Any help appreciated.

using filter method:

//this took 52393ms
var file = JSON.stringify(data);
var arrayOfStrings = file.split('"');
var images = arrayOfStrings.filter(function(str) {
    return str.indexOf('.jpg') !== -1;
});
resolve(images);

using for loop:

//this took 52681ms
var file = JSON.stringify(data);
var arrayOfStrings = file.split('"');
var images =[];
for(let i = 0; i < arrayOfStrings.length; i++) {
    if(arrayOfStrings[i].indexOf('.jpg') !== -1) {
        images.push(arrayOfStrings[i]);
    }
}
resolve(images);

data looks like the following after I use file.split('"');

[ '{','rstuv',':{','options',':[{','![alt](CKrgUgiYMflaWnsGZ009.jpg)']];
nasoj1100
  • 528
  • 9
  • 15
  • 1
    You should try `for` with `var` instead of `let`. `let` will create a new variable for every iteration. – Rajesh Jun 14 '16 at 14:23
  • 4
    can you, please, provide an example of your initial data? – Foker Jun 14 '16 at 14:24
  • 2
    Use a profiler to see exactly where the time is spent. It could be the JSON serialisation that is the real culprit – alex Jun 14 '16 at 14:25
  • 2
    Do you really want the strings that *include* `.jpg`, or those that end with it? If the latter, using `.endsWith` is likely to be faster. – Bergi Jun 14 '16 at 14:30
  • @alex and others - I misread the requirement. Thanks for correcting. – Nikhil Aggarwal Jun 14 '16 at 14:30
  • 3
    @PatrickRoberts: I find it hard to believe that `let` would have 200% performance impairment. Especially when the scope is not used and it can trivially be optimised into the same code as `var`. – Bergi Jun 14 '16 at 14:32
  • @Bergi I recently answered a question that had a benchmark of `var` vs `let`. The loop performed at ~100ms for `var` while it performed ~300ms for `let`. Huge difference. Here's the link: http://stackoverflow.com/q/37792934/1541563 – Patrick Roberts Jun 14 '16 at 14:34
  • @PatrickRoberts If you carefully read the comments there, this depends a lot on the environment. Here, OP using node not the Chrome devtools console. – Bergi Jun 14 '16 at 14:41
  • @Bergi, if _you_ read the _question_ there carefully, the original test was performed in node, and is completely relevant here. – Patrick Roberts Jun 14 '16 at 14:43
  • using endsWith() shaves a full second off it. – nasoj1100 Jun 14 '16 at 14:45
  • @PatrickRoberts That test is with an empty loop running 10^8 times. The difference could end up in the noise once you start doing some real computations. – 1983 Jun 14 '16 at 14:52
  • 1
    @PatrickRoberts I get no better than a 2% improvement of `var` over `let` [here](https://jsfiddle.net/52sfddx8/). – 1983 Jun 14 '16 at 16:02
  • @FizzyTea you are correct. I even increased the amount of iterations to make it last 30 seconds and the difference is still minimal. I, of course, revoke my statements. – Patrick Roberts Jun 14 '16 at 17:07

1 Answers1

5
var file = JSON.stringify(data);
var arrayOfStrings = file.split('"');

Don't do that. If you want to search through data, keep it structured. Stringifying and then searching that string will not only get you a bunch of mistakes (when strings did contain a quote), lots of array elements that are not even proper strings, and is hardly any improvement over just reading in the original XML file as text and directly searching that.

Instead, just iterate (or recurse if necessary) through the data object (see Access / process (nested) objects, arrays or JSON for details), and filter for the strings (and property names?) in those locations where you expect them. This will be a lot faster.

Community
  • 1
  • 1
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • it seems getting the xml's from S3 takes up the bulk of the time. But I did manage to shave another 7 or 8 seconds off. Thanks for your help. – nasoj1100 Jun 15 '16 at 12:46