0

I'm currently working with Node.js, and have built a socket that accepts data. I am attempting to process the data in a streaming fashion, meaning that I process the data (nearly) as quickly as I receive it. There is a rather significant bottleneck in my code, however, that is preventing me from processing as quickly as I'd like.

I've distilled the problem into the code below, removing the extraneous information, but it captures my issue well enough:

require('net').createServer(function (socket) {
    var foo = [];

    socket.on('data', function (data) {
        foo.push(data); // Accessing 'foo' causes a bottle neck
    });

}).listen(8080);

Changing the code in the data event, improves performance considerably:

var tmpFoo = foo;
tmpFoo.push(data);
// Do work on tmpFoo

The problem is, I eventually need to access the global (?) variable (to save information for the next data event); incurring the performance penalty along with it. I'd much prefer to process the data as I receive it, but there does not appear to be any guarantee that it will be a "complete" message, so I'm required to buffer.

So my questions:

  • Is there a better way to localize the variable, and limit the performance hit?
  • Is there a better way to process the data in a streaming fashion?
Julio
  • 2,261
  • 4
  • 30
  • 56
  • 1
    tmpFoo and foo are the same, so if you push to tmpFoo, it will be in foo as well... – dandavis Mar 09 '14 at 00:47
  • 1
    How do you know that that non-local reference causes a "bottleneck"? How have you measured that? I find that pretty unlikely; it's certainly more expensive than accessing a local variable, but by an amount of time measured in fractions of a microsecond. – Pointy Mar 09 '14 at 00:47
  • @Pointy Yes, I've measured it. While it's only a minimal delay for any given event, the speed/data throughput I'm processing causes this delay to add up quickly. – Julio Mar 09 '14 at 01:42
  • 2
    This does seem sort of unbelievable--`tmpFoo` is just a pointer to `foo`, so it still needs to be dereferenced just like `foo` when doing `tmpFoo.push`. If anything, I'd think using the temp variable would be *slower* given the extra instructions required to set up the pointer. How are you measuring performance? And are you definitely *only* changing that line of code? – sgress454 Mar 09 '14 at 02:15
  • I didn't think it was a problem either, so I ran down a few rabbit holes before getting to this line. I figured that it was the processing I was doing on `foo`, so I commented it out - same performance. As soon as I commented out the access to `foo` it immediately sped up. Localizing the variable (i.e. `tmpFoo`) maintained the speed improvement, but at the loss of the data on the next iteration. – Julio Mar 09 '14 at 02:29
  • The loss of data is what makes me feel like something else is going on. Setting `tmpFoo` to `foo` and then doing `tmpFoo.push` should also alter `foo`, since the one is just a pointer to the other. No data should be lost. Are you sure it's not instantiating multiple servers (and this multiple copies of `foo`?) – sgress454 Mar 09 '14 at 21:19
  • Can you provide a set of data the server receives ? – Libert Piou Piou Jul 06 '14 at 15:21

1 Answers1

-4

dont use anonyme functions like that:

 createServer(function (socket) {

Define functions seperately and call these as follow:

 var foo = [];

 function createMyServer(socket) {    
      socket.on('data', reveiveDataFromSocket);
 }

 function reveiveDataFromSocket(data) {
      foo.push(data);
 }

 require('net').createServer(createMyServer).listen(8080);
Adrian Preuss
  • 3,228
  • 1
  • 23
  • 43
  • This will make no measurable difference. – Pointy Mar 09 '14 at 00:46
  • Oh yes. learn it. http://stackoverflow.com/questions/80802/does-use-of-anonymous-functions-affect-performance – Adrian Preuss Mar 09 '14 at 00:49
  • 1
    That question is almost six years old. In a modern JavaScript runtime, there'll be very little difference. – Pointy Mar 09 '14 at 00:53
  • 2
    that example is not the same at all. defining a function in a for loop would create a new function each iteration whereas passing a function to createServer() would only create 1 function, no matter how many times that function gets called. i agree that using named functions is a better practice, but not for the stated reason. – dandavis Mar 09 '14 at 00:54
  • depending on how much is received via the data stream, it can cause the exact same problem. Even if the link is very old, but you should adhere to the principles and dont hope that the software it regulated the problems (like the webbrowser or the Node.js backend). – Adrian Preuss Mar 09 '14 at 00:58
  • 1
    This doesn't really seem relevant in this case regardless of its age. In the OP's post, each function is created exactly once, which is in contrast to the Q&A you linked to, in which the function is being created inside a loop. @Pointy - even in the case of Node today, I think the question of the overhead of frequent function creation is related to the potential loss of compiler optimizations, most of which v8 does not apply until the second time a function is run, according to the engineering talks they have given (can be found online) - impact would vary with function complexity of course. – barry-johnson Mar 09 '14 at 01:33
  • @barry-johnson that may be true, but note that the *code* of a function object in a situation like that may very well be shared by all the Function *instances* - it's invariant, after all. – Pointy Mar 09 '14 at 01:40
  • 2
    @Pointy - it very well may be shared, indeed. I may stub out a quick benchmark for fun. In the trivial case (a function which simply adds two numbers) the impact of function creation versus just adding two numbers is .037 seconds per million calls (on my machine obviously - .002 vs .039 were the times), so I'm not losing sleep about it. :-) – barry-johnson Mar 09 '14 at 01:53