I am trying to use Spark in a HPC focused cluster that has infiniband interconnections. This cluster does not provide support for IPoIB
. I saw the Spakr-RDMA project from ohio state university in here. I cannot find anyone else working on this, or if apache spark is going to support IB in the future. The question is is there any other solution to get more updated version of spark in HPC environments with only IB as network?
Asked
Active
Viewed 494 times
5
-
What is missing for Spark to be using IPoIB? – haggai_e Dec 14 '16 at 08:52
-
Sorry, I updated the question. I meant the cluster doesn't support ipoib. If IPoIB was there I would have not have any problem at all. – M.Rez Dec 14 '16 at 08:56
-
I see. Anyway, I don't know of other attempts to use Spark with RDMA. – haggai_e Dec 14 '16 at 09:39
-
Since Spark is antithetical (in its implementation) to a shared memory system, I think it makes little sense to use it with shared memory system. On the other hand, there are vendors pushing Spark onto mainframes. If you can convince a vendor, that Spark's API is a selling point for their HPC platform, then that might be your ticket. I still believe that you'd have to re-implement many of the basics pretty radically, while keeping pace with Spark's breakneck API development speed - a major challenge and probably why the attempt to support this was eventually stopped. – Rick Moritz May 15 '17 at 07:27
1 Answers
1
You can check the reference guide for deploying RDMA over Ethernet (RoCE) to accelerate Apache Spark 2.2.0 over Mellanox 100 GbE Network https://community.mellanox.com/docs/DOC-3068

Ian Chang
- 122
- 5