I'm running a program in spark and opening a CSV-file and creating instances in parallel. I have a similar problem as the code-snippet below (from http://spark.apache.org/docs/latest/sql-programming-guide.html).
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
}
);
If i wanted to assign unique ID's to all these persons, how would I go about since it's done in parallel?