0

The schema of my data frame is

scala> x.printSchema()
root
 |-- pangaea_customer_id: string (nullable = true)
 |-- persona_model: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- score: double (nullable = true)
 |    |    |-- tag: string (nullable = true)
 |-- process_date: string (nullable = true)

and here is an example row for this database:

x.show(1)

+--------------------+--------------------+-------------+
| pangaea_customer_id|       persona_model| process_date|
+--------------------+--------------------+-------------+
|000000E91010441BB...|Map(Tech -> [0.21...|2018-05-16-01|
+--------------------+--------------------+-------------+

I want to create a new dataframe which contains 2 coloums of x.pangaea_customer_id and its respective score (which is inside map).

Here is what I have tried so far, I am using this command:

val newDF = oldDF.select(col("pangaea_customer_id"), col("persona_model")("Tech")("score"))

but this only gives values of score whose key is "Tech", I want all the score values for all the customers, what should I replace "Tech" with?

my output is here,

scala> newDF.show(10,false)
+--------------------------------+-------------------------+
|pangaea_customer_id             |persona_model[Tech].score|
+--------------------------------+-------------------------+
|000000E91010441BB122402A45D439E7|0.21678                  |
|000000FB2B304F60B244FEAFDE932640|null                     |
|000003E2565A4C88B9DAADDE5B5ADE71|null                     |
|000009D9D1B3443E95F21C58D708B196|null                     |
|000009EB8F6C4BFABA730726DCFE1FEE|null                     |
|0000119D3561461E96F8BA2B9523579A|null                     |
|00001296DC394AED93A19BBD79A5533C|null                     |
|000014D91E6D4A44AA98E0118E349A52|null                     |
|0000156A2B5D4275980AB9FD4F8C9163|null                     |
|000015EC31FC426E9A5477FE0A857982|1.23                     |
+--------------------------------+-------------------------+

it is showing null score for all those ids whose key int the map is "tech" which makes sense because i have typed "tech" in my above command also. but i want all the scores and not the null values.

Romal Jaiswal
  • 59
  • 1
  • 5
  • As I said the last time you posted this question, start with exploding the map into two different columns. Then extract the `score` from the struct using `getItem` – philantrovert May 23 '18 at 08:21
  • Can you show the result of `x.show(1, false)` (non-truncated output) and add more information of how the output should look like? Maybe take the row you show here and show how you want it to look. – Shaido May 23 '18 at 08:22
  • @Shaido i have edited and shown my output. – Romal Jaiswal May 23 '18 at 08:32
  • @philantrovert i have exploded as u said but i am getting whole struct as an entry in the coloumn and i cant extract the scores from that , can u please explain my how getItem work. please ! – Romal Jaiswal May 23 '18 at 08:34
  • suppose the structs are inside a column called `value`, you need to use `$"value".getItem("score")`. Please go through the duplicate suggested by @user8371915 . You'll find everything there. – philantrovert May 23 '18 at 08:35

0 Answers0