4

We are trying to create an application where parts of it may be distributed, but not necessarily are. For that we'd like to use an existing framework for remote calls. To not implement everything twice, we'd like to use the same thing for calls on the same machine, in the same process.

Does anyone know the performance / latency penalty that we'll get when using such a framework instead of calling a vtable directly? Are there comparisons available?

The system should be portable on Windows & Linux

Regards Tobias

Tobias Langner
  • 10,634
  • 6
  • 46
  • 76
  • 1
    Take a look at http://www.zeromq.org/, they provide an efficient abstraction layer over in-process communication and TCP communications. In my experience, it proved to be very fast. I don't know Corba or Thrift enough to be able to compare the performance. – Matthieu Rouget May 29 '13 at 07:48

5 Answers5

3

omniORB for a long time has had a co-located shortcut that made direct calls, but starting with version 4 it has a proprietary POA policy that bypasses even more of the required CORBA behavior to make it almost as fast as a direct virtual call. See the omniORB Wiki and search for "Shortcut local calls." Unfortunately this doesn't seem to be in the official docs, at least that I could find.

Brian Neal
  • 31,821
  • 7
  • 55
  • 59
1

What is common with most communication frameworks that I'm aware of is that they will always serialize, send and deserialize, which will always be a performance hit over passing references to other threads and accessing data directly (with or without mutex). This shouldn't always be dramatic when responsibilities are assigned wisely to minimize communication.

Remark that with these sort of architectural choices, performance is only one of the aspects to consider. Others are: security, stability, flexibility, deployment, maintainability, licenses, etc...

stefaanv
  • 14,072
  • 2
  • 31
  • 53
  • yes, I know. But if the performance is not sufficient for our usecase the rest doesn't really matter. My problem is: how "dramatic" is the overhead. – Tobias Langner May 29 '13 at 08:13
  • My answer still remains: there is serialization, so there will be overhead, unless you find a framework which guarantees to avoid this when used intraprocess. I can't help with a performance test of the different frameworks. – stefaanv May 29 '13 at 08:44
  • that's why I asked whether someone has already done it. Btw, TAO describes such an optimization - but I haven't tried it yet. – Tobias Langner May 29 '13 at 08:49
  • 1
    I worked with omniorb, which doesn't use sockets for internal communication (I think pipes are used), but still uses serialization. This could be the same case for TAO. For Corba, communication is always done via IDL-interface, so at least you will have to fill that data... – stefaanv May 29 '13 at 09:09
  • my preferred way would be having something close to a vtable-call in case it's in-proc. But thanks for the insight on omniorb. – Tobias Langner May 29 '13 at 09:53
  • When using the through POA or direct optimization in TAO a collocated call is really nothing more than just a few virtual method calls, no serialization happens. – Johnny Willemsen May 29 '13 at 14:07
  • @JohnnyWillemsen: nice to hear. I'm a bit surprised based on how we worked with Corba, but who am I to argue with someone from TAO ;-). Maybe a more detailed answer about this would be more appropriate, even if it is a bit late. I'll look into it when needed. – stefaanv May 29 '13 at 16:36
  • @stefaanv When enabling collocation we just pass the data directly from client to server, there is an optimized invocation path where we don't have to marshal the data at all. Two flavors exist, through POA does a check in the POA whether the servant is active, direct directly goes to the servant. Besides that optimization you can also plugin special transports so that you are not using ethernet as wire but you can use for example also shared memory, UDP, multicast, etc. Also the recent versions of TAO support ZIOP which is an addition that allows compression of data when marshaled. – Johnny Willemsen May 30 '13 at 15:16
  • 1
    ORBexpress is another example of a CORBA ORB which just makes a direct C++ call when it is in process. It does not perform marshalling in this case. – Brian Neal May 30 '13 at 20:53
  • @TobiasLangner The site is up again, there was a disk crash at WashU – Johnny Willemsen May 31 '13 at 08:56
  • I don't think this answer is right. There are at least 3 ORBs that I know of that skip the marshalling / demarshalling steps when a call is co-located. – Brian Neal Jun 01 '13 at 17:20
  • I just accepted the answer because it started the discussion - I'll upvote any useful answer. Unfortunately, there's no way to accept multiple answers. – Tobias Langner Jun 01 '13 at 23:31
1

From ZeroMQ / Learn the basics:

In 2011, CERN (the European Organization for Nuclear Research) compared CORBA, Ice, Thrift, ZeroMQ, YAMI4, RTI, and Qpid (AMQP). Read their analysis and conclusions. (PDF)

Which might just be the comparison you were after. (Found thanks to Matthieu Rouget's comment.)

I'd also pitch in that, while some ORBs allow you to skip the marshalling, you still can't avoid the dynamic memory allocation, which is what really matters for performance. (Today CPUs are insanely fast, memory access is slow, and asking the OS to allocate a memory page is really slow.)

So wherein C++ you might just return a const string &, CORBA's C++ binding will force you to dynamically allocate and free a string or data structure (whether by return type or out parameter). This isn't significant if the method calls across process/network anyway, but in-process it becomes quite significant compared to plain C++.

Another 'gotcha' we were burnt by, is that you can't define mutually-recusive structures (i.e. struct 'A' includes a 'B' which includes an 'A' again). This meant we had to convert those to interfaces, which allocates a CORBA Servant "server side" (in-process) per structure, which is very memory heavy. I gather there are advanced tricks to avoid actually creating servants, but ultimately we just want to get away from CORBA altogether, not dig ourselves in deeper.

Especially in C++, memory management is very fragile and difficult to program correctly. (See The Rise and Fall or CORBA, section 'complexity'.) I attribute many person-years of additional effort due to this technology choice.

I'd be curious to hear how you got on & what you adopted.

Luke Usherwood
  • 3,082
  • 1
  • 28
  • 35
0

One of several reasons for IBM System Object Model creation was CORBA. IBM SOM is "local CORBA" and IBM DSOM is an implementation of CORBA.

You should probably estimate somFree.

Another option is UNO (from OpenOffice.org). I can't say I like UNO, It's worse, but it's more mature than long forgotten SOM. UNO local (in-process) ecosystem is separated into partitions depending on programming language. C++ and Java are most common partitions. There is no serialization, but preferred mechanism for inter-partition interaction is late binding (Java Proxy->Java Dispatch->C++ Dispatch->C++ object) (kinda IDispatch in OLE) although direct bindings can be also maid (Java Proxy->C++ object).

OCTAGRAM
  • 618
  • 6
  • 12
0

ICE from ZeroC definately supports collocation invocation when marshalling of data is avoided. You can find details on documentation from their site: http://doc.zeroc.com/display/Ice/Location+Transparency Though collocation call has some overhead vs virtual method call, unfortunately I do not have actual numbers, but it also depends on conditions ie how many servants registered in particular adapter etc.

Slava
  • 43,454
  • 1
  • 47
  • 90