0

I have input dataframe as below.

 +-------+----------------+------------+   
 |ID     |Title           |values      |    
 +-------+----------------+------------+  
 |ID-1   |First Name      |Jolly       |  
 |ID-1   |Middle Name     |Jr          |  
 |ID-1   |Last Name       |Hudson      |  
 |ID-2   |First Name      |Kathy       |  
 |ID-2   |Last Name       |Oliver      |  
 |ID-3   |Last Name       |Short       |  
 |ID-3   |Middle Name     |M           |  
 |ID-4   |First Name      |Denver      |  
 +-------+----------------+------------+   

I require the output as follows:

 +-------+----------------+---------------+--------------+  
 |ID     |First Name      |Middle Name    | Last Name    |   
 +-------+----------------+---------------+--------------+ 
 |ID-1   |Jolly           |Jr             | Hudson       |      
 |ID-2   |Kathy           |null           | Oliver       | 
 |ID-3   |null            |M              | Short        |
 |ID-4   |Denver          |null           | null         |
 +-------+----------------+---------------+--------------+   

Please suggest the possible solutions to obtain this result.
Thanks in advance.

Sindu
  • 3
  • 2
  • I am new to programming and did not know that this is called pivoting.. Hence wouldn't have understood that the post you mentioned is similar one – Sindu Aug 31 '18 at 02:03

1 Answers1

0

Here's one approach that groups the dataset by pivoting the Title to aggregate the Values using first:

val df = Seq(
  ("ID-1", "First Name", "Jolly"),
  ("ID-1", "Middle Name", "Jr"),
  ("ID-1", "Last Name", "Hudson"),
  ("ID-2", "First Name", "Kathy"),
  ("ID-2", "Last Name", "Oliver"),
  ("ID-3", "Last Name", "Short"),
  ("ID-3", "Middle Name", "M"),
  ("ID-4", "First Name", "Denver")
).toDF("ID", "Title", "Values")

df.
  groupBy("ID").pivot("Title").agg(first($"Values")).
  show(false)
// +----+----------+---------+-----------+
// |ID  |First Name|Last Name|Middle Name|
// +----+----------+---------+-----------+
// |ID-1|Jolly     |Hudson   |Jr         |
// |ID-3|null      |Short    |M          |
// |ID-4|Denver    |null     |null       |
// |ID-2|Kathy     |Oliver   |null       |
// +----+----------+---------+-----------+
Leo C
  • 22,006
  • 3
  • 26
  • 39
  • Hi, This was working initially but now it throws following exception : java.lang.IllegalArgumentException: Field "null" does not exist. Please suggest the possible reason for this and solution too. Thanks in Advance – Sindu Sep 12 '18 at 14:09
  • @Sindu, nullifying some of the `Title` column values in the sample data would generate an additional `null` column by `pivot`, but that alone wouldn't result in an `IllegalArgumentException: Field "null" does not exist` error – unless the generated column is subsequently de-selected then referenced in some method afterward. If that doesn't seem to be the case, I would suggest that you post a separate question with minimal sample code and data that can reproduce the described error. – Leo C Sep 12 '18 at 16:29