Given the following snippet (Spark version: 1.5.2):
rdd.toDF().write.mode(SaveMode.Append).parquet(pathToStorage)
which saves RDD data to flattened Parquet files, I would like my storage to have a structure like:
country/
year/
yearmonth/
yearmonthday/
The data itself contains a country column and a timestamp one, so I started with this method. However, since I only have a timestamp in my data, I can't partition the whole thing by year/yearmonth/yearmonthday as those are not columns per se...
And this solution seemed pretty nice, except I can't get to adapt it to Parquet files...
Any idea?