I have a program that reads a CSV file from the local filesystem. Spark ( run in local mode ) in actually using all 16 cores of the instance. So I have 16 tasks running in parallel.
Now , what I want to do is to tune its performance when reading the file.
When checking in Spark UI , I found that each task reads 128MB of the file as input size (default value of Hadoop's blocksize). As the instance has 120GB of RAM, I would like to increase the input size per task.
What configuration should I run to do so ?