Let's say I have a dataframe with the following columns:
# id | name | 01-Jan-10 | 01-Feb-10 | ... | 01-Jan-11 | 01-Feb-11
# -----------------------------------------------------------------
# 1 | a001 | 0 | 32 | ... | 14 | 108
# 1 | a002 | 80 | 0 | ... | 0 | 92
I want to expand this into a table like this:
# id | name | Jan | Feb | ... | Year
# -----------------------------------
# 1 | a001 | 0 | 32 | ... | 2010
# 1 | a001 | 14 | 108 | ... | 2011
# 1 | a002 | 80 | 0 | ... | 2010
# 1 | a002 | 0 | 92 | ... | 2011
I'd like to split the dates into rows by year and capture the values per month.
In pyspark (python + spark) how might this be accomplished? I've been attempting to collect the df data to iterate over and extract each field to write to each row, but I wondered if there were a more clever spark function that would help with this. (new to spark)