0

I have A Dataframe that looks like

+------+------------+------------------+
|UserID|Attribute   | Value            |
+------+------------+------------------+
|123   |  City      | San Francisco    |
|123   |  Lang      | English          |
|111   |  Lang      | French           |
|111   |  Age       | 23               |
|111   |  Gender    | Female           |
+------+------------+------------------+

So i have few distinct Attributes that can be null for some users (limited Attributes say 20 max)

I want to Convert this DF to

+-----+--------------+---------+-----+--------+
|User |City          | Lang    | Age | Gender |
+-----+--------------+---------+-----+--------+
|123  |San Francisco | English | NULL| NULL   |
|111  |          NULL| French  | 23  | Female |
+-----+--------------+---------+-----+--------+

I'm quite new to Spark and Scala.

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Kalpish Singhal
  • 382
  • 1
  • 3
  • 20
  • Is it worth adding an attempt at this? It is as risk of closure if it is just a set of requirements. If you can research the problem and try, that is often much appreciated. – halfer Dec 28 '17 at 11:46
  • @halfer i tried getting distinct of Attributes and defined a new Dataframe but that seems causing issues with mapping values into the desired columns – Kalpish Singhal Dec 28 '17 at 12:16
  • I don't know this technology, but that information, and the code you used, might be useful edited into the question. It is good to show that you have genuinely tried something, given that many people here do not even bother doing a search. – halfer Dec 28 '17 at 12:18

1 Answers1

2

You can use pivot to get the desired output:

import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._

df.groupBy("UserID")
  .pivot("Attribute")
  .agg(first("Value")).show()    

This will give you the desired output:

+------+----+-------------+------+-------+
|UserID| Age|         City|Gender|   Lang|
+------+----+-------------+------+-------+
|   111|  23|         null|Female| French|
|   123|null|San Francisco|  null|English|
+------+----+-------------+------+-------+
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59