0

I have a simple json array, and I am able to read it in spark- dataframe. Can you help to wrap that columns in to custom-root tag. to be more precise, exactly opposite to explode option, confining whole dataframe rows of custom target root-column.

Initial Json Data:
[{"tpeKeyId":"301461865","acctImplMgrId":null,"acctMgrId":null,"agreCancDt":null,"agreEffDt":null,"pltfrmNm":"EMPLOYEE NAVIGATOR","premPyRmtInd":null,"recCrtTs":"2016-11-08 13:01:44.290418","recCrtUsrId":"testedname","recUpdtTs":"2018-10-16 12:16:21.579446","recUpdtUsrId":"testname","spclInstrFormCd":null,"sysCd":null,"tpeNm":"EMPLOYEE NAVIGATOR","univPrdcrId":"9393939393"},{"tpeKeyId":"901972280","acctImplMgrId":null,"acctMgrId":null,"agreCancDt":null,"agreEffDt":null,"pltfrmNm":"datalion","premPyRmtInd":null,"recCrtTs":"2018-12-10 01:36:14.925833","recCrtUsrId":"exactlydata","recUpdtTs":"2018-12-10 01:36:14.925833","recUpdtUsrId":"datalion        ","spclInstrFormCd":null,"sysCd":null,"tpeNm":"lialion","univPrdcrId":"89899898989"}]

First Dataframe:

+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+
|acctImplMgrId|acctMgrId|agreCancDt|agreEffDt|pltfrmNm          |premPyRmtInd|recCrtTs                  |recCrtUsrId|recUpdtTs                 |recUpdtUsrId    |spclInstrFormCd|sysCd|tpeKeyId |tpeNm             |univPrdcrId|
+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+
|null         |null     |null      |null     |EMPLOYEE NAVIGATOR|null        |2016-11-08 13:01:44.290418|testedname |2018-10-16 12:16:21.579446|testname        |null           |null |301461865|EMPLOYEE NAVIGATOR|9393939393 |
|null         |null     |null      |null     |datalion          |null        |2018-12-10 01:36:14.925833|exactlydata|2018-12-10 01:36:14.925833|datalion        |null           |null |901972280|lialion           |89899898989|
+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+

After concatenating root tag manually:

    val addingRootTag= "{ \"roottag\" :" + fileContents + "}"    
    val rootTagDf = spark.read.json(Seq(addingRootTag).toDS())
    rootTagDf.show(false)
Second Dataframe:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|roottag                                                                                                                                                                                                                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[,,,, EMPLOYEE NAVIGATOR,, 2016-11-08 13:01:44.290418, testedname, 2018-10-16 12:16:21.579446, testname,,, 301461865, EMPLOYEE NAVIGATOR, 9393939393], [,,,, datalion,, 2018-12-10 01:36:14.925833, exactlydata, 2018-12-10 01:36:14.925833, datalion        ,,, 901972280, lialion, 89899898989]]|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Question is, Do we have any such method in spark-framework supported api to avoid manual concatenation of the roottag and getting first-dataframe wrapped to build displayed as second dataframe ? EXACTLY OPPOSITE TO EXPLODE OPTION

naveen p
  • 74
  • 9
  • 2
    can you give example of what you are trying to do? what have you done thus far? and expected output? – Aaron Jun 10 '19 at 21:15
  • please note that with out proper example just explaining in words wont help answerers as well as you also. Hope you got it it should be reproducable example to give solid answer. next time take care – Ram Ghadiyaram Jun 10 '19 at 21:16
  • Please provide some input/output data as described here https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples – abiratsis Jun 11 '19 at 13:32
  • please let me know if question is still unclear. – naveen p Jun 18 '19 at 15:04

0 Answers0