I see a lot of examples using array to create vertex first then parallelize it to make it a RDD, but if I have huge data then how would I handle it? I don't think I can create an array of say 1 million rows of vertex.
There is another post, Spark GraphX - How can I read from a JSON file in Spark and create a graph from the data?, also suggested to use array as well, correct me if I am wrong but again I don't think it would work.
Thanks in advance.