I am looking to read a parquet file that is stored in HDFS and I am using Python to do this. I have this code below but it does not open the files in HDFS. Can you help me change the code to do this?
sc = spark.sparkContext
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet('path-to-file/commentClusters.parquet')
Also, I am looking to save the Dataframe as a CSV file as well.