0

Second post here and this one's really got me scratching my head. I have a function which processes an array to try to find similar data. The array contains 1410 elements which I consider to be a lot but nothing that Node or my computer shouldn't be able to handle.

My code is giving "Segmentation Fault: 11" error which I found was to do with memory access issues so I even want as far as to test my Mac's RAM but everything's fine. The segfault makes it very difficult to debug which is why I came here.

The code where something is going wrong is within here:

return matchings.map(matchArray => {
  const namesList = matchArray
    .map(matchItem => matchItem.name)
    .sort((a, b) => a.localeCompare(b))

  const symbolsList = matchArray
    .map(matchItem => matchItem.symbol)
    .sort((a, b) => a.localeCompare(b))

  return {
    name: common.getMode(namesList),
    symbol: common.getMode(symbolsList),
    matches: matchArray
  }
}).sort((a, b) => a.name.localeCompare(b.name))

Where matchings is the array I'm talking about. common.getMode(array) has this code:

array.sort()

const stats = {
  top: {
    name: '',
    freq: 0
  },
  current: {
    name: array[0],
    freq: 1
  }
}

for (let idxName = 1; idxName < array.length; idxName++) {
  const currentName = array[idxName]
  const lastName = array[idxName - 1]

  if (currentName === lastName) {
    stats.current.freq++
  } else {
    if (stats.current.freq > stats.top.freq) {
      stats.top.name = stats.current.name
      stats.top.freq = stats.current.freq
    }
    stats.current = {
      name: currentName,
      freq: 1
    }
  }
}

if (stats.current.freq > stats.top.freq) {
  stats.top.name = stats.current.name
  stats.top.freq = stats.current.freq
}

return stats.top.name

It's worth mentioning that when performed with an array of smaller size ~1000, the code works fine which leads me to believe it is not my code. There is also very little content online about Node's Segfault 11 which isn't helping.

Any ideas greatly appreciated!

James Middleton
  • 679
  • 1
  • 5
  • 8
  • Thanks for your response. The JSON of the non-working array is here: [link](https://pastebin.com/SnVJM7xN) and the working one is here: [link](https://pastebin.com/GUYMMs6S). Is there any way this could be due to a timeout in waiting for a promise because the function takes about 8 seconds before it gives the segfault? – James Middleton Sep 05 '17 at 07:45

1 Answers1

1

TL;DR remove stress from the call stack by using tail call optimization.

Edit (with explanation)

See Understanding Javascript Functions Execution... to understand the difference between call stack, memory heap and queue. Whereas the objects and variables live in heap land, the function calls are referenced in the call stack, which your 2nd dataset depleted (16'000 stack frames) ; so your algorithm couldn't keep up because it had no way to continue allocating new function calls.

See this StackOverflow answer, which points to further information about the call stack and this one too, which points to ways to get data on the heap.

Original answer

I may be completely off, but I'm curious to see if converting your loops to recursion could help the memory cope. I'd try it on my box but setting everything up is a hassle.

Could you try this? It uses spread operator and array destructuring, so you may have to add babel-preset-stage-0 to your project and a .babelrc file too.

Javascript

let common = {};
common.getMode = (arr, compare_fn) => {
  const compare = !!compare_fn ? compare_fn : (a, b) => a.localeCompare(b)
  arr.sort(compare)

  const stats = {
    top: {
      name: '',
      freq: 0
    },
    current: {
      name: arr[0],
      freq: 1
    }
  }

  for (let i=1, imax = arr.length ; i < imax ; ++i) {
    const currentName = arr[i]
    const lastName = arr[i - 1]

    if (currentName === lastName) {
      stats.current.freq++
    } else {
      if (stats.current.freq > stats.top.freq) {
        stats.top.name = stats.current.name
        stats.top.freq = stats.current.freq
      }
      stats.current = {
        name: currentName,
        freq: 1
      }
    }
  }

  if (stats.current.freq > stats.top.freq) {
    stats.top.name = stats.current.name
    stats.top.freq = stats.current.freq
  }

  return stats.top.name
};

const build_prop_list = (prop, input_array, output_array = []) => {
  if(input_array.length == 0) return output_array;
  else {
    const [current, ...tail] = input_array;
    const new_array = [...output_array, current[prop]];
    return build_prop_list(prop, tail, new_array);
  }
}

const work = (input_array, output_array = []) => {
  if(input_array.length == 0) return output_array.sort((a, b) => a.name.localeCompare(b.name));
  else {
    const [matchArray, ...tail] = input_array;

    const namesList = build_prop_list("name", matchArray);
    const symbolsList = build_prop_list("symbol", matchArray);

    const new_element = {
      name: common.getMode(namesList),
      symbol: common.getMode(symbolsList),
      matches: matchArray
    };

    const new_array = [...output_array, new_element];
    return work(tail, new_array);
  }
}

let result = work(insert_your_json_here);

Edit

You may also apply tail call optimization to your for loop in common.getMode(...). The first iteration's behavior is different though, because lastName doesn't reference the last name of the array (index: -1), but the first one. See if it fits your needs and you'd optimize your code a bit more.
This should replace the for loop in common.getMode(...).

  const feed = (input_array) => {
    if(input_array.length == 0) return;
    const [lastName, currentName, ...tail] = input_array;

    if (currentName === lastName) {
      stats.current.freq++
    } else {
      if (stats.current.freq > stats.top.freq) {
        stats.top.name = stats.current.name
        stats.top.freq = stats.current.freq
      }
      stats.current = {
        name: currentName,
        freq: 1
      }
    }

    return feed(tail);
  }(arr);
Sumi Straessle
  • 1,116
  • 12
  • 23
  • You seem to have solved it, thank you very much. Is there a limitation of Node which has caused this? I was monitoring memory usage while running my code and the memory seemed fine. I think Node should be able to handle this, don't you? – James Middleton Sep 05 '17 at 20:17
  • This is called tail-call optimization. Basically, you go from the stack growing proportionally to the number of its argument, to a stack that should grow to a fixed size (depending on how many times you branch within the function or call other functions). I'll try and reference some literature for you. I don't think it's dependent on Node. It's a problem that'll crop up on any languages and computer, IMHO. You can also optimize your loop within `common.getMode(...)`, but the behavior changes for the first iteration. I'll write it too as an edit. – Sumi Straessle Sep 05 '17 at 20:39
  • See [Tail call optimization in ECMAScript 6](http://2ality.com/2015/06/tail-call-optimization.html) and [JS ES6 Recursive Tail Call Optimization](https://medium.com/@mlaythe/js-es6-recursive-tail-call-optimization-feaf2dada3f6). The idea is to not branch into another function or evaluation at the end of your function, and use recursion to do the looping for you. You can also upvote the answer if it has been helpful to you. – Sumi Straessle Sep 05 '17 at 20:44
  • 1
    I have tried to upvote the answer, it's fantastic, however my reputation is below 15 at the moment. I'll check this out tomorrow because I'm too tired. Thanks again Sumi ! – James Middleton Sep 05 '17 at 22:03
  • I'm back Sumi. I've had a look at your edits properly and done plenty of research on the whole thing. I'm really happy that the code that you provided works, however I'm trying to actually figure out what caused the segfault, as this will help me better understand JS and V8. I've been able to confirm that there is no call stack overflow due to logging `new Error.stack` and only has two or three calls on the stack. I have also used `process.memoryUsage()` and compared `heapTotal` and `heapUsed` which shows I have ~5MB left of heap. – James Middleton Sep 07 '17 at 10:45
  • This means that it must be a limitation in the queue. Do you know a way of logging any sort of information on the queue? I have managed to create a stacktrace by using the npm package `node-segfault-handler` which gives me [this](https://pastebin.com/zRkWK7Ax) response. Have you any idea how I can go about debugging this? – James Middleton Sep 07 '17 at 10:48
  • @JamesMiddleton no, I don't. funny thing, it's not the call stack that 's overflowed. Have you tried removing and reinstalling `node_modules` with your original code? or using a different version of Node? I'd try and tweak the stack size (for investigation purposes) or implement a queuing mechanism and see where the original code breaks. – Sumi Straessle Sep 08 '17 at 10:02
  • I have tried reinstalling`node_modules`. Haven't tried the other suggestions but the last two are beyond my capabilities at the moment. I may come back to it in the future. Thanks very much for all of your responses :) – James Middleton Sep 11 '17 at 14:34