4

What is the best way to share data in Vert.x between verticles? I'm mostly interested in low-overhead, direct, concurrent access. Event system doesn't seem to be appropriate when multiple verticles need to access to the things that are big enough not to be economical to be sent as json.

Simple example - consider an imaginary postal service using Vert.x. Say it has two verticles: one to figure out the next package to send and another to figure out the fuel spent in the previous hour by the vehicles delivering packages.

Say there are 1000s of packages at any given moment. There could be a database returning these, but then it would need to return all of the packages, as algorithms for determining which package to send / fuel spent are complex enough and being executed by two verticles.

So far I've found these:

Some suggestions are:

  • Use JSON.stringify / JSON.parse through vertx.getMap. This looks like quite an overhead for me, especially when things are updated often (e.g. package location can contain a GPS coordinate)
  • Use EHCache, Hazelcast, etc., but these mostly die off with a "you can try" conclusion without details

Is there a canonical Vert.x solution here that I failed to recognize? I'm OK with dividing things in a different manner, having more verticles, lower / higher granularity, etc., if that's the way to go - i.e. this is more an architectural question with regard to Vert.x model.

I'm also interested to see open source examples related to the above, if any.

Community
  • 1
  • 1
levant pied
  • 3,886
  • 5
  • 37
  • 56
  • Hi there. What you are describing to me sounds like a classical case of the need for persistency layer, that handles the caching and requests accordingly. I actually don't know why you would need to stringify into sharedMaps, but overall to have set of multiple maps can help here, not a single one, for multiple updates. But in general, I would probably approach this with a Persistency Module/Verticle handling such requests in caching and storages and requests through either Maps or the EventBus. And I don't see that as an overhead. Cheers – INsanityDesign Aug 14 '14 at 09:23
  • Thanks @INsanityDesign. Stringifying is needed as maps cannot hold complex objects (e.g. an instance of a `Package` class, describing the package in great detail - destination address, weight, items within, etc.), so need to serialize. Can you clarify the multiple maps approach you had in mind (probably best as an answer)? Also, when you say persistence - that would mean loading everything every time (an algorithm needs all the packages (1000s of them) available to do its job) - an event between persistence and algo verticle would be huge, don't you think? – levant pied Aug 14 '14 at 12:04
  • I rather meant to split the data as it seems you are more or less thinking about a big blob-like dataset where all information is stored. If this is the case, you will have such an overhead no matter what solution you pick. My solution would rather be to split the data logically through multiple shared maps or have a persistency layer who cares about changes, additions etc. logically & communicates these to all attached verticles. I do not fully get how you would plan to have a big object always being changed & communicated to everywhere. That's a big overhead no matter what channel you chose. – INsanityDesign Aug 14 '14 at 12:47
  • Thanks @INsanityDesign. "you are more or less thinking about a big blob-like dataset" not really - that's what the person in the first post I linked to suggested. I just need all the data in one place to run the algo. Similar to other problems - e.g. finding the exit in a maze which can change (i.e. walls, starting position, etc.). Say the maze is 1000x1000. Where would you store it for algo purposes (not persistence purposes)? Local field in all verticles that need it? EHCache instance? Something else? Do you have an example open-source Vert.x project tackling a similar problem? – levant pied Aug 14 '14 at 18:47
  • Wrong words ^^: what you described is what I meant. Let's say you need to have an always updated 100x100 maze then you should have it everywhere or centralized. If you have it centralized (e.g. in a Memcached) it gets big for changes & needs to be tx-safe. This is why I suggested to have a verticle caring about this, calc your complex algos and collecting the changes. If the algos are distributed, the object should be centralized through a verticle that cares about tx-safety. Honestly, I can't see the issue in the described problem & would just do it. I don't know what you expect. Think slick! – INsanityDesign Aug 18 '14 at 16:43

1 Answers1

0

The clustering story for node applications is not great, especially if you do not want to use Java.

Since it sounds like you want to have the lowest-level control of data sharing, I'd recommend bridging event busses between the verticles using conventional websockets.

DoctorPangloss
  • 2,994
  • 1
  • 18
  • 22