I need to ingest very wide fixed width dataset in parquet columns Cureentky i am reading wide dataset as RDD in scala and then splitting the columns with substring function and then write to parquet
Currently fixed width records are close 10 millions and it takes ti 2 days to load the data .
Can anyone please tell me which is the most efficient way of reading the wide dataset in scala or java