I am implementing some Spark Structured Streaming transformations from a Parquet data source. In order to read the data into a streaming DataFrame, one has to specify the schema (it cannot be automatically inferred). The schema is really complex and manually writing the schema code will be a very complex task.
Can you suggest a walkaround? Currently I am creating a batch DataFrame beforehand (using the same data source), Spark infers the schema and then I save the schema to a Scala object and use it as an input for the Structured Streaming reader.
I don't think it is a reliable or a well performing solution. Please suggest how to generate the schema code automatically or somehow persist the schema in a file and reuse it.