I need to convert the following dataframe:
╔══════╦════════╦════════╦════════╗
║ Year ║ ColA ║ ColB ║ ColC ║
╠══════╬════════╬════════╬════════╣
║ 2017 ║ 1 ║ 2 ║ 3 ║
║ 2018 ║ 4 ║ 5 ║ 6 ║
║ 2019 ║ 7 ║ 8 ║ 9 ║
╚══════╩════════╩════════╩════════╝
Into this:
╔══════╦════════╦═══════╗
║ Year ║ColName ║ Value ║
╠══════╬════════╬═══════╣
║ 2017 ║ ColA ║ 1 ║
║ 2017 ║ ColB ║ 2 ║
║ 2017 ║ ColC ║ 3 ║
║ 2018 ║ ColA ║ 4 ║
║ 2018 ║ ColB ║ 5 ║
║ 2018 ║ ColC ║ 6 ║
║ 2019 ║ ColA ║ 7 ║
║ 2019 ║ ColB ║ 8 ║
║ 2019 ║ ColC ║ 9 ║
╚══════╩════════╩═══════╝
This needs to support any number of columns besides the first "Year" one, which could be 1 or many. And it should be a generic solution, meaning it should not use hard-coded column names anywhere, but it should read the column names directly from the original dataframe.
I'm using Databricks with a notebook written in Scala. Very new to both Spark and Scala.
UPDATE
I've found this solution in Python that works well, but I'm having a hard time converting it to Scala.
def columnsToRows(df, by):
# Filter dtypes and split into column names and type description.
# Only get columns not in "by".
cols, dtypes = zip(*((c, t) for (c, t) in df.dtypes if c not in by))
# Create and explode an array of (column_name, column_value) structs
kvs = F.explode(F.array([
F.struct(F.lit(c.strip()).alias("ColName"), F.col(c).alias("Value")) for c in cols
])).alias("kvs")
return df.select(by + [kvs]).select(by + ["kvs.ColName", "kvs.Value"])