I have a dataframe with the following schema:
subjectID, feature001, feature002, feature003, ..., feature299
Let's say my dataframe looks like:
123,0.23,0.54,0.35,...,0.26
234,0.17,0.49,0.47,...,0.69
Now, what I want is:
subjectID, featureID, featureValue
The above dataframe would look like:
123,001,0.23
123,002,0.54
123,003,0.35
......
123,299,0.26
234,001,0.17
234,002,0.49
234,003,0.47
......
234,299,0.69
I know how to achieve it if i have only several columns:
newDF = df.select($"subjectID", expr("stack(3, 'feature001', 001, 'feature002', 002, 'feature003', 003) as (featureID, featureValue)"))
However, I am looking for a way to deal with 300 columns.