I have a large dataset of 5 million items, consisting of their ID, cost, etc. I have been using sqlContext in the Pyspark shell to load the JSON and create a dataframe and finally, applying all required operations on that dataframe.
I'm new to spark and had a query that whenever I do an operation on my dataframe, whether it be inbuilt functions (eg. Loading the JSON using sqlContext.read.json(filePath) ) or using udf, is it automatically multithreaded or do I need to specify something explicitly to make it multithreaded? If it is multithreaded, how can I view and change the number of threads currently being used?