0

I have a data frame with a large string field in it with a load of questionnaire questions and answers held in a json Dictionary style pairs, the questionnaires vary in length and do not always the same questions.

Using Spark SQL how can I get the questions and answers out of the string field Responses so i have a 3 column list of "CustomerID, Questions, Answers"

So from this

CustomerID     Responses
1              [{"question1":"answer 1"},{"question 2":"answer2"}]
2              [{"question1":"answer 1a"},{"question 2":"answer2b"}]
3              [{"question1":"answer 1b"},{"question 3":"answer3"}]

to resolve to

CustomerID   Questions  Answers
1            question1  answer1
1            question2  answer2
2            question1  answer1a
2            question2  answer2b 
3            question1  answer1b 
3            question3  answer3 

I'm looking at lateral view explode but i just cant see how to use it

%sql
select r.ID,
       lat.question,
       lat.answer

from table r 
   lateral view outer explode(r.responses) lat as question, answer

error says cannot resolve 'explode(r.responses)' due to data type mismatch: input to function explode should be array or map type, not string'

Jim
  • 569
  • 6
  • 15

2 Answers2

0

try these resources:

JSON SQL Processing MSDN

SQLSHack Example

after reading that, it would be simple to create the SQL query

Dean Van Greunen
  • 5,060
  • 2
  • 14
  • 28
  • All the examples show ```label : value``` where as the data i have is ```value:value``` pairs how do i extract that out without explicitly calling the label value because I want to group them into 2 columns not many columns – Jim Sep 28 '21 at 14:40
0

You can parse json data while selecting from sql See example

      Select * from parseJSON('{
           "Person":
             {
                "firstName": "John",
                "lastName": "Smith",
                "age": 25,
               }
           "Address":
              {
                 "streetAddress":"21 2nd Street",
                 "city":"New York",
                 "state":"NY",
                "postalCode":"10021"
              },
           "PhoneNumbers":
             {
                 "home":"212 555-1234",
                 "fax":"646 555-4567"
               }
         }
   }  )

Use parseJSON() procedure

Parse JSON in TSQL

Chukwu Remijius
  • 323
  • 1
  • 14