This question has a partial answer here : Write to multiple outputs by key Spark - one Spark job
But i want to save a Dataframe to multiple csv files.
df = sqlContext.createDataFrame([Row(name=u'name1', website=u'http://1', url=u'1'),
Row(name=u'name2', website=u'http://1', url=u'1'),
Row(name=u'name3', website=u'https://fsadf', url=u'2'),
Row(name=u'name4', website=None, url=u'3')])
df.write.format('com.databricks.spark.csv').partitionBy("name").save("dataset.csv")
And i'm using spark-csv (https://github.com/databricks/spark-csv) to handle csv data.
One more thing, df.write.partitionBy("column").json("dataset")
, saves data to multiple directories like column=value1, column=value2
etc , but the column itself is not present in the data.
what if i need that column in the output dataset ??