I have one dataframe with 3 columns and 20,000 no of rows. i need to be convert all 20,000 transid into column. table macro:
prodid | transid | flag |
---|---|---|
A | 1 | 1 |
B | 2 | 1 |
C | 3 | 1 |
so on..
Expected Op be like upto 20,000 no of columns:
prodid | 1 | 2 | 3 |
---|---|---|---|
A | 1 | 1 | 1 |
B | 1 | 1 | 1 |
C | 1 | 1 | 1 |
I have tried with PIVOT/transpose function but its taking too long time for high volume data. for processing 20,000 rows to column its taking around 10 hrs. eg.
val array =a1.select("trans_id").distinct.collect.map(x => x.getString(0)).toSeq val a2=a1.groupBy("prodid").pivot("trans_id",array).sum("flag")
When i used pivot on 200-300 no of rows then it is working fast but when no of rows increase PIVOT is not good. can anyone please help me to find out the solution.is there any method to avoid PIVOT function as PIVOT is good for low volume conversion only.How to deal with high volume data. I need this type of conversion for matrix multiplication. for matrix multiplication my input be like below table and final results will be in matrix multiplication. |col1|col2|col3|col4| |----|----|----|----| |1 | 0 | 1 | 0 | |0 | 1 | 0 | 0 | |1 | 1 | 1 | 1 |