0

I am now doing some operations using GraphX and want something like this

val ans = graph.triplets.map(
    e => {
        if (conditon1){
            return ans_1 to RDD_1
        }
        else (condition2){
            return ans_2 to RDD_2
        }
    }
)

I know I can use double runs of graph.triplets.map to return 2 different RDD, like this

val RDD_1 = graph.triplets.map(
    e => {
        if (conditon1){
            return ans_1
        }
    })
val RDD_2 = graph.triplets.map(
    e => {
        if (condition2){
            return ans_2
        }
    })

However in order to improve the efficiency I want to do it in a single run as I depicted above. How can I achieve it?

Litchy
  • 355
  • 1
  • 4
  • 18
  • @RameshMaharjan If so, how can I create this kind of tuple, what is its structure? Because RDD_1 and RDD_2 are not the same size. – Litchy May 07 '18 at 05:57
  • @RameshMaharjan I added the 2 runs version, it is obviously slower because we have run one more traversal – Litchy May 07 '18 at 06:33
  • @RameshMaharjan Must it have a else expression? The official example of Spark does not, in `spark/examples/graphx/AggregateMessagesExample.scala` – Litchy May 07 '18 at 06:44
  • why don't you use filter instead of map. filter two times as you are doing with map. that would be more efficient than map. – Ramesh Maharjan May 07 '18 at 07:41
  • Possible duplicate of [How do I split an RDD into two or more RDDs?](https://stackoverflow.com/questions/32970709/how-do-i-split-an-rdd-into-two-or-more-rdds) – Alper t. Turker May 07 '18 at 09:52
  • @user9613318 thank you. But that is in Python and it is the Python's feature grammar. How can I use it in scala? – Litchy May 07 '18 at 10:04
  • 1
    @Litchy I don't think that the code is that important there. More the conclusion, that you if want two RDDs, you'll need two separate transformations. – Alper t. Turker May 07 '18 at 10:10

0 Answers0