I am importing fixed width file from source which very wide dataset having fixed width string length close to 120000 and i need to extract 20K columns from that and write to parquet.
Can anyone suggest me some performance optimization techniques which can reduce the file reading time. I am reading the source file as RDD. but it takes lot of time.
Can anyone suggest different ways which will reduce the time like JAVA IO stream.