Yes, it's good. Schema Infter will cause that file will be read twice - once for Schema Infer, second for read into Dataset.
From Spark code for DataFrameReader
- similar is in DataStreamReader
:
This function will go through the input once to determine the input
schema if inferSchema
is enabled. To avoid going through the
entire data once, disable inferSchema
option or specify the
schema explicitly using schema
.
Link to code
However, it may be difficult to maintain schema for 100 Datasets with 200 columns each. You should also have in mind maintainability - so, typical answer will be - it depends :) For not-so-big schemas or not-so-difficult infer but with large files, I recommend using custom schema written in code