I have a pyspark data frame with multiple columns as follows:
name col1 col2 col3
A 1 6 7
B 2 7 6
C 3 8 5
D 4 9 4
E 5 8 3
I want to create a new dataframe in pyspark by combining the column names and column values of col1, col2, col3 into two new columns, say, new_col and new_col_val, spread across rows:
I did the same in R using the following code:
df1 <- gather(df,new_col,new_col_val,-name)
I was thiking to create 3 separate dataframes which will contain each column from the original dataframe and then append them together but my data is having more than 2500k rows and around 60 columns. Creating multiple dataframes will be the worst idea. Can anyone please tell me how can I ddo this in pyspark?