I want to export data to separate text files; I can do it with this hack:
for r in sqlContext.sql("SELECT DISTINCT FIPS FROM MY_DF").map(lambda r: r.FIPS).collect():
sqlContext.sql("SELECT * FROM MY_DF WHERE FIPS = '%s'" % r).rdd.saveAsTextFile('county_{}'.format(r))
What is the right way to do it with Spark 1.3.1/Python
dataframes? I want to do it in a single job as opposed to N (or N + 1) jobs.
May be:
saveAsTextFileByKey()