0

I am trying to parse JSON records using Groovy script which is called through NiFi. Based on the input I got from Nested JSON with duplicate keys, it was working fine. But my groovy script got failed for few records if the value column has double quotes (") in it.

E.g.,
"CT":"text/javascript; charset="UTF-8""

To fix the above issue, I replaced all the double quotes (") from the record and made the record look like,

CT:text/javascript; charset=UTF-8

and used JsonParserType.LAX while parsing the JSON. Then the issue got resolved. But I faced another problem where incoming JSON colunn value was having comma (,) in between

UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75,app_store:googleplay,app_uuid:c6bed4

For the above value,

After this piece of code got executed,
new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)

Output looks like,
UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML,

and remaining values of the "UA" column is just coming separately like below without any key,

537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75

and this part is completely missed in the parsed output

 like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/

Below is my complete code with sample json record,

import groovy.json.*
import java.util.function.*
/* Example record for extra double quotes in the column value */
// def json = '{"transaction":[{"TS":"2020-02-04 07:02:55.074013","V":"v8","PID":"p31177","RS":"359","SR":"434","CnID":"0/1","CIPG":{"CIP":"10.176.63.122","CP":"35073","SLP":"35073","SLEP":"35073","CVID":"410"},"SIPG":{"SIP":"54.93.254.233","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"73","R":"73","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"73"},"AS":"365","OS":"365","CPr":"0","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"-","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"S","Flag23":"c","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"s","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"j","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"365","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"31/31/31 21 1/1 0.0/0.0","CT":"text/javascript; charset="UTF-8"","MS":"447715522518","MSH":"1C5SujOJcZs4SCCoUSQSynOXaCyND1q4TQ30kcryo/Y=","SID":"v4.172.16.146.148:9EE6EC7A17108A46","SuID":"-","UA":"okhttp/3.9.1","DID":"Default-Android-Web-Unknown","UAG":"Android-Web","CID":"TECH","HR":"-","CRG":"3001010006000000","CP1":"8232F40186F032F40107B06F6E","CP2":"3599650837621501","AIDF":"1:1386","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"GET","HD":"pubsub.pubnub.com","HP":"/v2/presence/sub-key/sub-c-ee729d78-6233-11e2-b80b-12313f022c90/channel/uk9_18395/heartbeat","HQ":"requestid=5bd3d724-84f7-4f91-865c-a141136e537f&state=%257B%257D&heartbeat=300&pnsdk=PubNub-Java-Unified/4.19.0&uuid=600752db-00ad-4478-9688-b8ee2b13414f"},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'
/* Example record for comma in the column value */
def json = '{"transaction":[{"TS":"2020-02-04 07:04:50.478718","V":"v8","PID":"p12318","RS":"4087","SR":"4162","CnID":"0/89","CIPG":{"CIP":"10.146.84.250","CP":"52106","SLP":"18518","SLEP":"18518","CVID":"410"},"SIPG":{"SIP":"104.18.14.201","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"36","R":"36","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"36"},"AS":"463","OS":"480","CPr":"-3","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"c","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"F","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"G","Flag23":"G","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"b","Flag4":"T","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"5","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"480","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"35/35/35 18 1/2 0.0/0.0","CT":"application/json;charset=UTF-8","MS":"447707123573","MSH":"G7RPssPDwUd1pKsRNs8KFtSEIZGIyJktwNDWboHmXds=","SID":"v4.172.16.138.148:9EE6E0766C39E5CC","SuID":"-","UA":"Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={"aaid":"9541535b-ab50-41c7-9ae4-4d729b501d75","app_store":"googleplay","app_uuid":"c6bed4","DID":"Android mobile SM-","UAG":"Android-Web","CID":"bm_unks","HR":"-","CRG":"0401020006000000","CP1":"8232F401C31032F4010018D678","CP2":"3562611081227201","AIDF":"6:5","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"POST","HD":"ire-d01.fordealdc.com","HP":"/api/dotData","HQ":""},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'

def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]').replace('"','').replace(':-,',':,').replace(':-}',':}')
def jsonRecord = new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)
println "After parsing: " + jsonRecord
static def flatten(row, prefix="") {
    def flattened = new HashMap<String, String>()
    println "Input: " + row
    row.each { String k, Object v ->
        def key = prefix ? prefix + "_" + k : k;

        if (v instanceof Map) {
            flattened.putAll(flatten(v, k))
        } else {
            flattened.put(key,v.toString())
        }
    }

    println "Output: " +flattened
    return flattened
}

static def toCSVRow(HashMap row) {
    def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"] 

    return columns.collect { column ->
        return row.containsKey(column) ? row.get(column) : ""
    }.join(',')
}

def a = jsonRecord.transaction.collect { row ->
    return flatten(row)
}.collect { row ->
    return toCSVRow(row).toString().replace('[','').replace(']','')
}.join('\n')

println a

I need advise on how to handle this. All I want to know is how my Groovy script can parse the data properly for all the JSON records.

  • 1
    Why? Why do you bend over backwards to make broken JSON work? Don't bang strings together to create JSON. Or if you absolutely must, then at least do the quoting properly where you create those strings (replace `"` with `\"`) – cfrick Feb 08 '20 at 14:20
  • I am not creating the JSON. Source is providing the data like this which i need to parse and load it to target. Is there is any option to parse it properly – Sriraam Venkataraman Feb 08 '20 at 15:56
  • 1
    Don't put up with this; tell your source, they are producing broken JSON. If they are unable to produce conformant JSON, they should just invent their own format. – cfrick Feb 08 '20 at 16:01
  • @sriraam-venkataraman, show please the json source example. – daggett Feb 09 '20 at 12:00
  • It's available in the above code. def json= – Sriraam Venkataraman Feb 09 '20 at 15:38

1 Answers1

0

As the comments to your question mention, the content that you’re attempting to parse is not JSON. The best approach is to start from scratch and find the commonalities in the data.

If each name/value pair is on a separate line (separated by /r or /n) then line break the content to get a list.

Since each pair uses an ‘=‘ to separate them, next iterate through that list and break the line using the separator. Trim each side and you have your name/value pair.

Edit: You’ll need to walk through the map you get as a result and parse it to see if there are arrays or objects being defined and massage the data to accommodate that.