I have a simple question about spark.
Imagine a file with this data:
00000000000
01000000000
02000000000
00000000000
01000000000
02000000000
03000000000
I want to create a rdd or sparkdataframe that breaks this data based on the lines that starts with 00. So it will be a rdd of string arrays that in this case, based on this example of data, would be something line this:
[00000000000, 01000000000, 02000000000] // first row
[00000000000, 01000000000, 02000000000, 03000000000] // second row
So it would split the data based on the lines starting with 00, and create a array of strings containing all the other lines until it finds another line starting with 00, where the next row of the rdd should start.
I would really appreciate some code example for that.
Thank you.