2

We have a simple JSON feed which provides stock/price information at a certain point in time.

e.g.

{t0, {MSFT, 20}, {AAPL, 30}}
{t1, {MSFT, 10}, {AAPL, 40}}
{t2, {MSFT, 5}, {AAPL, 50}}
  1. What would be a preferred mechanism to store/retrieve this data and to plot a graph based on this data (say MSFT). Should I use redis or mysql?
  2. I would also want to show the latest entries to all users in the portal as and when new data is received. The data could be retrieved every minute. Should I use node.js for this

Ours is a rails application and would like to know what libraries/database should I use to model this capability.

Rpj
  • 5,348
  • 16
  • 62
  • 122
  • This may be helpful regarding Redis use cases: http://oldblog.antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html – benjaminjosephw Mar 13 '14 at 22:25
  • 1
    I had to implement a datawarehouse in our company and I started with Blobs in MySQL. We query the db every night a few 100,000 per hour and the system responses (LAN) within 70ms. I think Redis & Co. (i.e [KDB of kx](http://kx.com)) are designed to find similar data over a huge range (like Get all PX_LAST, PX_MID, BID, ASK on 18-Mar-2014 at 16:15:10 [EDT]) or make some statistical analysis (STD, EWMA, CORR, AR). – Markus Mar 18 '14 at 21:00

5 Answers5

3
  1. Depends on the traffic and the data. If the data is relational, meaning it is formally described and organized according to the relational model, then MySQL is better. If most of the queries are get and set with key->value , meaning you are going to get the data using one key, and you need to support many clients and many set/get per minute, then defiantly go with Redis. There are many other noSQL DBs that might fit, have a look at this post for a great review of some of the most popular ones.

  2. So many ways to do this.. if getting an update once a minute is enough have the client do AJAX calls every minute to get the updated data, and then you can build your server side using php, .NET, java servlet ot node.js, again, depend on the expected user concurrency. PHP is very easy to develop on, while node.js can support many short i/o requests. Another option you might want to consider is you use server push (Node's socket.io for example) instead of the client AJAX call. In this way the client will be notified immediately on an update.

Personally, I like both node.js and Redis and used them couple of production applications, supporting many concurrent users using a single server. I like node since it's easy to develop, and support many users, and Redis for it's amazing speed and concurrent requests. Having said that, I also use MySQL for saving relational data, and PHP servers for fast development of APIs. Each have its own benefits.

Hope you'll find this info helpful.

Kuf.

Kuf
  • 17,318
  • 6
  • 67
  • 91
3

As Kuf mentioned, there are so many ways to go about this and it really does depends on your needs: low latency, data storage, or ease of implementation.

Redis will most likely be the best solution if you are going for a low latency and easy solution to implement. You can use Pub/Sub to push updates to clients (e.g. Node’s socket.io) in real-time and run a second Redis instance to store the JSON data as a sorted set using the timestamp as a score. I’ve used the same to much success storing time-based statistical data. The downside to this solution is that it is resource (i.e. memory) expensive if you want to store a lot of data. 

If you are looking to store a lot of data in JSON format and want to use a pull to fetch data every minute, then using ElasticSearch to store/retrieve data is another possibility. You can use ElasticSearch’s range query to search using a timestamp field, for example:

"range": {
   "@timestamp": {
       "gte": date_from,
       "lte": now
   }
}

This adds the flexibility of using an extremely scalable and redundant system, storing larger amounts of data, and a RESTful real-time API. 

Best of luck!

kai
  • 368
  • 1
  • 6
2

If you have a lot of data to keep updated in real-time like stock ticker prices, the solution should involve the server publishing to the client, not the client continually hitting the server for updates. Publish/subscribe (pub/sub) type model with websockets might be a good choice at the moment, depending on your client requirements.

For plotting the data using data from websockets there is already a question about that here.

Ruby-toolbox has a category called HTTP Pub Sub which might be a good place to start. Whether MySQL or Redis is better depends on what you will be doing with it aside from just streaming stock prices. Redis may be a better choice for performance. Note also that websocket-rails assumes Redis, if you were to use that- just as an example.

I would not recommend a simple JSON API (non-pubsub) in this case, because it will not scale as well (see this answer), but if you don't think you'll have many clients, go for it.

Community
  • 1
  • 1
Gary S. Weaver
  • 7,966
  • 4
  • 37
  • 61
2

Since you're basically storing JSON data...

Postgres has a native JSON datatype

Also MongoDB might be a good fit too as JSON -> BSON

But if its just serving data even something as simple as memcached would suffice.

engineerDave
  • 3,887
  • 26
  • 28
1

Cube could be a good example for reference. It uses MongoDB for data storage.

For plotting time series data, you may try out cubism.js.

Both projects are from square.

Chaoyu
  • 842
  • 8
  • 16