Here is a data I want to retrieve by Scala. The data looks like this: userId,movieId 1,1172 1,1405 1,2193 1,2968 2,52 2,144 2,248
First I want to skip the first line, and then split user and movie by split(",") and map to (userID,movieID)
This is my first time trying scala, everything made me insane. I wrote this code to skip first line and split
rdd.mapPartitionsWithIndex{ (idx, iter) =>
if (idx == 0)
iter.drop(1)
else
iter }.flatMap(line=>line.split(","))
But the result is something like this:
1
1172
1
1405
1
2193
1
2968
2
52
I guess it's because mapPartitionsWithIndex Is there any way to correctly skip the header without change the structure?