My situation:
I have a set of sources and I have to pass them through layers of data, suppose that I have the layers A, B and C. Sometimes, any source lands in A layer with no data, only the header of the source, in my case, all data in A is avro. Then I have to pass it from A to B, in my case, layer B can be csv. Recently, the requires of layer B change and now I have parquet files too. I need the files because layer C need something to read, the header at least.
My problem:
It is when I have to parse that only header avro file to parquet file. Is there any solution using Spark/scala that can write only the header of a avro, parquet, etc format files?
I have a code that can parse only headers to csv, just listing the columns and writing that as csv or plainText but when I try to write in avro or parquet, it only writes the _SUCCESS
flag of spark. I have used the different save modes and properties that I have found and spark accepts.
For more information, I use spark 2.3.1 version, scala 2.11.11