16

I have a collection with 10M documents of 6000 stocks, stock name is indexed. When I subscribe to a new stock, meteor hangs more than 10 seconds to get about 3000 documents of this stock. Also after several stocks are subscribed, meteor hangs with 100% cpu usage. Meteor looks really slow with syncing "big" collection. Actually my app just read only. I am wondering if there is way to speed up meteor for read-only client? I am also wondering if creating a separate collection for each stock helps?

poordeveloper
  • 2,272
  • 1
  • 23
  • 36

5 Answers5

14

Meteor is pushing the entire dataset to your client.

You can turn off autopublish by removing the autopublish package:

meteor remove autopublish

Then create specific a specific subscription for your client.

When you subscribe you can pass a session variable as an argument, so on the client you do something like:

sub = new Meteor.autosubscribe(function(){ Meteor.subscribe('channelname', getSession('filterval')); }

On the server you use the argument to filter the result set sent to the client, so that you are not piping everything all at once. You segment the data in some fashion using a filter.

Meteor.publish('channelname', function(filter){ return Collection.find({field: filter}); }

Now, whenever you change the filterval on the client using setSession('filterval', 'newvalue'); the subscription will be automatically changed, and the new dataset will sent to the client.

You can use this as a means of controlling how much and what data is sent to the client.

As another poster said, you really have to ask if this is the best tool for this job. Meteor is meant for relatively small datasets that are updated in real-time in (potentially) two directions. It is heavily optimised and has a ton of scaffolding for that use case.

For another use case (such as the read-only huge dataset) it may not make sense. It has a lot of overhead that provides functionality that you are not going to use, and you'll be coding to get the functionality that you need.

Josh Wulf
  • 4,727
  • 2
  • 20
  • 34
13

I was struggling with the same issue. In my case I only had to sync ~3000 records, around 30KB total. After weeks of trying I eventually realized that the sync was not the issue, but seemingly the LiveHTML updates that happened while syncing.

I was able to reduce my page load from 10 seconds for 300 (filtered) records to less than 2 seconds for all 3000 records by disabling template updates during the initial page load. I accomplished that by adding a condition to the function that defined the template content:

Before (10s page load for 300 records being published by the server):

Template.itemlist.items = function () {
    return Item.find({type: 'car'},
                     {sort: {start: -1},
                      limit: 30});
};

To (2s page load for 3000 records published by the server):

Template.itemlist.items = function () {
    if (Session.get("active")) {    
        return Item.find({type: 'car'},
                         {sort: {start: -1},
                          limit: 30});
    } else {
        return [];
    }
};

To "activate" the session only once the data was loaded, I added:

Deps.autorun(function () {
    Meteor.subscribe("Item", 
                     {
                         onReady: function() {
                             Session.set("active", true);
                         }
                     });
});
Christian Fritz
  • 20,641
  • 3
  • 42
  • 71
10

While this is a scale issue and probably can be improved; it should be noted that you are using the wrong technology for your task, because Meteor is meant for interaction between clients and not for retrieving tons of read-only time sensitive data. While a status tracking screen might still somewhat make sense, time critical data in huge amounts certainly does not...

The whole Meteor stack introduces an extreme overhead over a simple implementation in any native stack; honestly, I would even take into account the overheads Java or C# would introduce and think twice when choosing between that and low level languages like PHP and C++. Languages like Ruby, Python, Node.js and more are really a different story; they're made for rapid prototyping but in terms of latency / throughput they are behind due to the overhead it takes to JIT them, not to forget at the overhead some non-native approaches to doing things add.

TL;DR: Use the right tools for the job, or you'll cut your fingers...

Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
Tamara Wijsman
  • 12,198
  • 8
  • 53
  • 82
  • I personally would like this to work with that kind of load. Editing a dataset of +- 10Mb at the time should not be a problem. the current issue seems to be that syncing the first 4-5Mb is really fast and then it slow down by a lot. – Thierry Mar 01 '13 at 15:27
  • 1
    Hi Tom, I appreciate the work you are doing with win.meteor.com! Regarding the issue above, can't this be managed in Meteor as suggested below? – ChatGPT Mar 15 '13 at 06:59
  • 1
    @MaxHodges: I didn't say you "could not", I said that it is just "wrong"; you can get quite some performance if you use the technology in the right way, but you can't get to the performance low level languages can offer you. If you can afford the slight delay in data, why not; but once you get to time, life and money critical stuff you really need to reconsider... – Tamara Wijsman Mar 15 '13 at 09:46
  • OK, I see your point, but the poster was just asking if there is any way to speed things up, not really demanding real-time performance. – ChatGPT Mar 15 '13 at 10:46
  • 1
    I have a similar problem but smaller scale. It takes meteor a few seconds to return results from this collection of only 2K documents. Wish it could do something smart like quickly return the few records it needs, then sync in the background async or something for future queries http://kanjifinder.whiterabbitpress.com/ – ChatGPT Mar 15 '13 at 10:48
  • [Cython](http://cython.org/) turns annotated Python into C. If you don't like to annotate your Python, try [PyPy](http://pypy.org/), [Nuitka](http://nuitka.net/), or [Shed Skin](http://shedskin.github.io/). – Cees Timmerman Feb 23 '16 at 17:37
2

With autopublish enabled you may see a performance hit with large collections of documents in Mongodb. You can address this by removing autopublish and write code to only publish the relevant data instead of the entire database.

The docs also go into managing cache manually:

Sophisticated clients can turn subscriptions on and off to control how much data is kept in the cache and manage network traffic. When a subscription is turned off, all its documents are removed from the cache unless the same document is also provided by another active subscription.

Additional performance improvements to Meteor are currently being worked on, including a DDP-level proxy to support "very large number of clients". You can see more detail on this at the Meteor roadmap.

ChatGPT
  • 5,334
  • 12
  • 50
  • 69
1

I love meteor's simplicity. I just stop using local mongodb collection to avoid overhead of sync, the performance looks really good.

Meteor.default_connection.registerStore "prices", 
  beginUpdate: ->
  update: (msg) ->
    updateChart(msg.set)
  endUpdate: ->
  reset: ->

for new meteor, below works.

  Meteor.default_connection.registerStore collection, 
    constructor: (@update) ->
    # Called at the beginning of a batch of updates.
    beginUpdate: ->
    update: (msg) ->
      update(msg.fields, msg.id) if msg.fields
    endUpdate: ->
    reset: ->
poordeveloper
  • 2,272
  • 1
  • 23
  • 36