I have problem to parse some of string associate with the 'details' key. the value of details key have duplicates string that should be extract as a pair of key/value.
This is sample of json data
{
"response": {
"client_log": {
"data": [
{
"login": "AAAAAAAAAAAAAA",
"state": "MC",
"details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.",
"log_number": "1"
},
{
"login": "BBBBBBBBBBBBB",
"state": "XX",
"details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user2@gmail.com\r\nServ Id: 767878\r\nProd Num: 34689\r\nProd Unit: Super-B\r\nProd Type: type-B\r\n,Serv Id: 128900\r\nProd Num: 13689\r\nProd Unit: Super-A\r\nProd Type: type-B\r\n,Serv Id: 96333\r\nProd Num: 0011321\r\nProd Unit: Super-C\r\nProd Type: type-C\r\nState: State2\r\nCity: City2\r\n\r\n------Service Information------\r\n\r\nUser Name: Marry\r\nMobile Number: 982130989\r\n\r\n------Reported Form------\r\n\r\nForm-1: xxxxxx\r\nType: 222\r\n\r\nRemarks: Remarks 456.",
"log_number": "1"
}
],
"query": "13"
},
"response_time": "0.723494",
"transaction_id": "909122",
"transaction_status": "OK"
}
}
from sample above please refer to details key below
"details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.",
got duplicates keys for example keys 'Prod Num', Prod Unit' and Prod Type' in the example above the keys appeared twice.
When I read the file it did not return all the key require under 'details'...the sample output as follows
{
'city': 'LONDON',
'login': 'AAAAAAAAAAAAAA',
'state': 'MC',
'details': 'Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\n**Serv Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X**\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.',
'log_number': '1',
'department': 'Sales',
'staff_id': 'S123',
'staff_name': 'EricY',
'timestamp': '2020-02-27 15:57:24',
'Email_Id': 'user1@gmail.com',
'Serv_Id': '112233',
'Prod_Num': '123456',
'Prod_Unit': 'Super-B',
'Prod_Type': 'type-X',
',Serv_Id': '003377',
'State': 'LONDON',
'City': 'LONDON',
'User_Name': 'John Clark',
'Mobile_Number': '000111222',
'Form-1': 'zzzzz',
'Type': '111',
'Remarks': 'Remarks 123.'
},
If you can see from above output I got
'Serv_Id': '112233' , 'Prod_Num': '123456', 'Prod_Unit': 'Super-B', 'Prod_Type': 'type-X' and ',Serv_Id': '003377'
because of the same keys it will replaced the values of each keys with the last/latest values ... in this case as per below values...the earlier values is replaced.
Prod Num: 123456, Prod Unit: Super-B and Prod Type: type-X after key ',Serv_Id': '003377'
I think it is due of duplicates keys. Some dictionary also got more than one ',Serv_Id' key... and this would mean more duplicates Prod Num, Prod Unit and Prod Type in the dictionary list and cannot be read properly as a key pair. The same keys will be replaced with the latest values...
How to overcome this duplicates key? maybe to change the key name to different name to make it unique.
I expect the output something as below
{
'city': 'LONDON',
'login': 'AAAAAAAAAAAAAA',
'state': 'MC',
'details': 'Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.',
'log_number': '1',
'department': 'Sales',
'staff_id': 'S123',
'staff_name': 'EricY',
'timestamp': '2020-02-27 15:57:24',
'Email_Id': 'user1@gmail.com',
'Serv_Id': '112233', ------>1st Serv_Id
'Prod_Num_1': '111111',--->1st prod_num with new keyname
'Prod_Unit_1': 'Super-A', --->1st prod_unit with new keyname
'Prod_Type_1': 'type-A', --->1st prod_type with new keyname
',Serv_Id': '003377',------>2nd Serv_Id with new keyname
'Prod_Num_2': '123456',--->2nd prod_num with new keyname
'Prod_Unit_2': 'Super-B', --->2nd prod_unit with new keyname
'Prod_Type_2: 'type-X', ---> 2nd prod_type with new keyname
'State': 'LONDON',
'City': 'LONDON',
'User_Name': 'John Clark',
'Mobile_Number': '000111222',
'Form-1': 'zzzzz',
'Type': '111',
'Remarks': 'Remarks 123.'
},
***The ',Serv_Id' key can be more than one. ***
Below is the script i used to read the file and extract 'details' to a keypair.
for entry in mydata['response']['client_log']['data']:
parsed_details = {}
for line in entry['details'].split('\r\n'):
try:
key, value = line.split(': ', maxsplit=1)
parsed_details[key] = value
parsed_details = { x.translate({32:'_'}) : y
for x, y in parsed_details.items()}
except ValueError:
pass
entry.update(parsed_details)
I appreciate your help on this matter. Please guide me. Thank you