Spark graphX: how to load big data to create a graph

Question

I see a lot of examples using array to create vertex first then parallelize it to make it a RDD, but if I have huge data then how would I handle it? I don't think I can create an array of say 1 million rows of vertex.

There is another post, Spark GraphX - How can I read from a JSON file in Spark and create a graph from the data?, also suggested to use array as well, correct me if I am wrong but again I don't think it would work.

Thanks in advance.

score 1 · Answer 1 · answered Apr 26 '16 at 16:59

1

If you data in a file, then you can directly create rdd on top of it:

val rdd : RDD[String] = sparkContext.textFile("/path/to/file")

and then your transform it to VertexRDD or EdgeRDD.

answered Apr 26 '16 at 16:59

Hlib

2,944
6
29
33

Spark graphX: how to load big data to create a graph

1 Answers1