How to read csv from S3 bucket using spark.read.csv function?

Question

I want to read the data from S3 bucket and the result will show like this picture bellow. I want to ensure that the compiler has a header in the raw data and can estimate the data types for all columns. So what are suitable arguments for this?

enter image description here

The code is df = (spark.read.csv("s3a://raw-recipes-clean-upgrad/RAW_recipes_cleaned.csv", argument to show the header, argument to estimate the data types))

I'm stuck with this. Any recommendation for me? Thankyou

score 0 · Answer 1 · answered Apr 21 '23 at 16:01

Try by adding header and inferSchema options as shown below.

Then spark will infer the datatypes and treat first line as header.

For all the available options for spark csv reader refer here

spark.read.option("header",True).option("inferSchema",True).csv("s3a://raw-recipes-clean-upgrad/RAW_recipes_cleaned.csv")

How to read csv from S3 bucket using spark.read.csv function?

1 Answers1