0

I see a lot of examples using array to create vertex first then parallelize it to make it a RDD, but if I have huge data then how would I handle it? I don't think I can create an array of say 1 million rows of vertex.

There is another post, Spark GraphX - How can I read from a JSON file in Spark and create a graph from the data?, also suggested to use array as well, correct me if I am wrong but again I don't think it would work.

Thanks in advance.

Community
  • 1
  • 1
Tara
  • 549
  • 2
  • 7
  • 14

1 Answers1

1

If you data in a file, then you can directly create rdd on top of it:

val rdd : RDD[String] = sparkContext.textFile("/path/to/file")

and then your transform it to VertexRDD or EdgeRDD.

Hlib
  • 2,944
  • 6
  • 29
  • 33