0

I'm trying to transpose some columns of my table to row. I found the previous post: Transpose column to row with Spark

I actually want the opposite way. Initially, I have:

+-----+--------+-----------+   
|  A  | col_id | col_value |  
+-----+--------+-----------+  
|  1  |   col_1|        0.0|  
|  1  |   col_2|        0.6|       
|  ...|     ...|        ...|        
|  2  |   col_1|        0.6|  
|  2  |   col_2|        0.7|   
|  ...|     ...|        ...|      
|  3  |   col_1|        0.5|  
|  3  |   col_2|        0.9|  
|  ...|     ...|        ...|  

And what I want is:

+-----+-----+-----+-------+  
|  A  |col_1|col_2|col_...|  
+-----+-------------------+  
|  1  |  0.0|  0.6|  ...  |  
|  2  |  0.6|  0.7|  ...  |  
|  3  |  0.5|  0.9|  ...  |  
|  ...|  ...|  ...|  ...  |  

How can I do it? Thanks!

NarendraR
  • 7,577
  • 10
  • 44
  • 82
Tii
  • 1
  • 3

1 Answers1

0

Hi you can use 'when' to emulate SQL CASE like statement, with that statement you redistribute data over columns , if you 'colid' is 'col2' and you are calculating col1 you simply put 0. After that with simple sum you reduce number of rows.

from pyspark.sql import functions as F  
df2=df.select(df.A, F.when(df.colid=='col_1', df.colval).otherwise(0).alias('col1'),F.when(df.colid=='col_2', df.colval)\
          .otherwise(0).alias('col2'))

df2.groupBy(df.A).agg(F.sum("col1").alias('col1'),\
F.sum("col2").alias('col2')).show()
zlidime
  • 1,124
  • 11
  • 6