I am trying to parse JSON records using Groovy script which is called through NiFi. Based on the input I got from Nested JSON with duplicate keys, it was working fine. But my groovy script got failed for few records if the value column has double quotes (") in it.
E.g.,
"CT":"text/javascript; charset="UTF-8""
To fix the above issue, I replaced all the double quotes (") from the record and made the record look like,
CT:text/javascript; charset=UTF-8
and used JsonParserType.LAX while parsing the JSON. Then the issue got resolved. But I faced another problem where incoming JSON colunn value was having comma (,) in between
UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75,app_store:googleplay,app_uuid:c6bed4
For the above value,
After this piece of code got executed,
new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)
Output looks like,
UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML,
and remaining values of the "UA" column is just coming separately like below without any key,
537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75
and this part is completely missed in the parsed output
like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/
Below is my complete code with sample json record,
import groovy.json.*
import java.util.function.*
/* Example record for extra double quotes in the column value */
// def json = '{"transaction":[{"TS":"2020-02-04 07:02:55.074013","V":"v8","PID":"p31177","RS":"359","SR":"434","CnID":"0/1","CIPG":{"CIP":"10.176.63.122","CP":"35073","SLP":"35073","SLEP":"35073","CVID":"410"},"SIPG":{"SIP":"54.93.254.233","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"73","R":"73","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"73"},"AS":"365","OS":"365","CPr":"0","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"-","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"S","Flag23":"c","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"s","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"j","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"365","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"31/31/31 21 1/1 0.0/0.0","CT":"text/javascript; charset="UTF-8"","MS":"447715522518","MSH":"1C5SujOJcZs4SCCoUSQSynOXaCyND1q4TQ30kcryo/Y=","SID":"v4.172.16.146.148:9EE6EC7A17108A46","SuID":"-","UA":"okhttp/3.9.1","DID":"Default-Android-Web-Unknown","UAG":"Android-Web","CID":"TECH","HR":"-","CRG":"3001010006000000","CP1":"8232F40186F032F40107B06F6E","CP2":"3599650837621501","AIDF":"1:1386","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"GET","HD":"pubsub.pubnub.com","HP":"/v2/presence/sub-key/sub-c-ee729d78-6233-11e2-b80b-12313f022c90/channel/uk9_18395/heartbeat","HQ":"requestid=5bd3d724-84f7-4f91-865c-a141136e537f&state=%257B%257D&heartbeat=300&pnsdk=PubNub-Java-Unified/4.19.0&uuid=600752db-00ad-4478-9688-b8ee2b13414f"},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'
/* Example record for comma in the column value */
def json = '{"transaction":[{"TS":"2020-02-04 07:04:50.478718","V":"v8","PID":"p12318","RS":"4087","SR":"4162","CnID":"0/89","CIPG":{"CIP":"10.146.84.250","CP":"52106","SLP":"18518","SLEP":"18518","CVID":"410"},"SIPG":{"SIP":"104.18.14.201","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"36","R":"36","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"36"},"AS":"463","OS":"480","CPr":"-3","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"c","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"F","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"G","Flag23":"G","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"b","Flag4":"T","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"5","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"480","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"35/35/35 18 1/2 0.0/0.0","CT":"application/json;charset=UTF-8","MS":"447707123573","MSH":"G7RPssPDwUd1pKsRNs8KFtSEIZGIyJktwNDWboHmXds=","SID":"v4.172.16.138.148:9EE6E0766C39E5CC","SuID":"-","UA":"Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={"aaid":"9541535b-ab50-41c7-9ae4-4d729b501d75","app_store":"googleplay","app_uuid":"c6bed4","DID":"Android mobile SM-","UAG":"Android-Web","CID":"bm_unks","HR":"-","CRG":"0401020006000000","CP1":"8232F401C31032F4010018D678","CP2":"3562611081227201","AIDF":"6:5","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"POST","HD":"ire-d01.fordealdc.com","HP":"/api/dotData","HQ":""},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'
def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]').replace('"','').replace(':-,',':,').replace(':-}',':}')
def jsonRecord = new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)
println "After parsing: " + jsonRecord
static def flatten(row, prefix="") {
def flattened = new HashMap<String, String>()
println "Input: " + row
row.each { String k, Object v ->
def key = prefix ? prefix + "_" + k : k;
if (v instanceof Map) {
flattened.putAll(flatten(v, k))
} else {
flattened.put(key,v.toString())
}
}
println "Output: " +flattened
return flattened
}
static def toCSVRow(HashMap row) {
def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"]
return columns.collect { column ->
return row.containsKey(column) ? row.get(column) : ""
}.join(',')
}
def a = jsonRecord.transaction.collect { row ->
return flatten(row)
}.collect { row ->
return toCSVRow(row).toString().replace('[','').replace(']','')
}.join('\n')
println a
I need advise on how to handle this. All I want to know is how my Groovy script can parse the data properly for all the JSON records.