1

I have problem to parse some of string associate with the 'details' key. the value of details key have duplicates string that should be extract as a pair of key/value.

This is sample of json data

{
  "response": {
   "client_log": {
      "data": [
        {
          "login": "AAAAAAAAAAAAAA",
          "state": "MC",
          "details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.",
          "log_number": "1"
        },
        {
          "login": "BBBBBBBBBBBBB",
          "state": "XX",
          "details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user2@gmail.com\r\nServ Id: 767878\r\nProd Num: 34689\r\nProd Unit: Super-B\r\nProd Type: type-B\r\n,Serv Id: 128900\r\nProd Num: 13689\r\nProd Unit: Super-A\r\nProd Type: type-B\r\n,Serv Id: 96333\r\nProd Num: 0011321\r\nProd Unit: Super-C\r\nProd Type: type-C\r\nState: State2\r\nCity: City2\r\n\r\n------Service Information------\r\n\r\nUser Name: Marry\r\nMobile Number: 982130989\r\n\r\n------Reported Form------\r\n\r\nForm-1: xxxxxx\r\nType: 222\r\n\r\nRemarks: Remarks 456.",
          "log_number": "1"
        }
      ],
      "query": "13"
    },
    "response_time": "0.723494",
    "transaction_id": "909122",
    "transaction_status": "OK"

  }
}

from sample above please refer to details key below

"details": "Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.",

got duplicates keys for example keys 'Prod Num', Prod Unit' and Prod Type' in the example above the keys appeared twice.

When I read the file it did not return all the key require under 'details'...the sample output as follows

{
          'city': 'LONDON',
          'login': 'AAAAAAAAAAAAAA',
          'state': 'MC',
          'details': 'Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\n**Serv Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X**\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.',
          'log_number': '1',
          'department': 'Sales',
          'staff_id': 'S123',
          'staff_name': 'EricY',
          'timestamp': '2020-02-27 15:57:24',
          'Email_Id': 'user1@gmail.com',
          'Serv_Id': '112233',
          'Prod_Num': '123456',
          'Prod_Unit': 'Super-B',
          'Prod_Type': 'type-X',
          ',Serv_Id': '003377',
          'State': 'LONDON',
          'City': 'LONDON',
          'User_Name': 'John Clark',
          'Mobile_Number': '000111222',
          'Form-1': 'zzzzz',
          'Type': '111',
          'Remarks': 'Remarks 123.'
        },

If you can see from above output I got

'Serv_Id': '112233' , 'Prod_Num': '123456', 'Prod_Unit': 'Super-B', 'Prod_Type': 'type-X' and ',Serv_Id': '003377' 

because of the same keys it will replaced the values of each keys with the last/latest values ... in this case as per below values...the earlier values is replaced.

Prod Num: 123456, Prod Unit: Super-B and Prod Type: type-X after key ',Serv_Id': '003377'

I think it is due of duplicates keys. Some dictionary also got more than one ',Serv_Id' key... and this would mean more duplicates Prod Num, Prod Unit and Prod Type in the dictionary list and cannot be read properly as a key pair. The same keys will be replaced with the latest values...

How to overcome this duplicates key? maybe to change the key name to different name to make it unique.

I expect the output something as below

{
          'city': 'LONDON',
          'login': 'AAAAAAAAAAAAAA',
          'state': 'MC',
          'details': 'Please find report below:\r\n\r\n------Report Information------\r\n\r\nEmail Id: user1@gmail.com\r\nServ Id: 112233\r\nProd Num: 11111\r\nProd Unit: Super-A\r\nProd Type: type-A\r\n,Serv Id: 445566\r\nProd Num: 22222\r\nProd Unit: Super-C\r\nProd Type: type-A\r\n,Serv Id: 003377\r\nProd Num: 123456\r\nProd Unit: Super-B\r\nProd Type: type-X\r\nState: LONDON\r\nCity: LONDON\r\n\r\n------Service Information------\r\n\r\nUser Name: John Clark\r\nMobile Number: 000111222\r\n\r\n------Reported Form------\r\n\r\nForm-1: zzzzz\r\nType: 111\r\n\r\nRemarks: Remarks 123.',
          'log_number': '1',
          'department': 'Sales',
          'staff_id': 'S123',
          'staff_name': 'EricY',
          'timestamp': '2020-02-27 15:57:24',
          'Email_Id': 'user1@gmail.com',
          'Serv_Id': '112233', ------>1st Serv_Id 
          'Prod_Num_1': '111111',--->1st prod_num with new keyname
          'Prod_Unit_1': 'Super-A', --->1st prod_unit with new keyname
          'Prod_Type_1': 'type-A', --->1st prod_type with new keyname
          ',Serv_Id': '003377',------>2nd Serv_Id with new keyname
          'Prod_Num_2': '123456',--->2nd prod_num with new keyname
          'Prod_Unit_2': 'Super-B', --->2nd prod_unit with new keyname
          'Prod_Type_2: 'type-X', ---> 2nd prod_type with new keyname
          'State': 'LONDON',
          'City': 'LONDON',
          'User_Name': 'John Clark',
          'Mobile_Number': '000111222',
          'Form-1': 'zzzzz',
          'Type': '111',
          'Remarks': 'Remarks 123.'
        },

***The ',Serv_Id' key can be more than one. ***

Below is the script i used to read the file and extract 'details' to a keypair.

for entry in mydata['response']['client_log']['data']:
    parsed_details = {}
    for line in entry['details'].split('\r\n'):
        try:
            key, value = line.split(': ', maxsplit=1)
            parsed_details[key] = value
            parsed_details = { x.translate({32:'_'}) : y  
                for x, y in parsed_details.items()}
        except ValueError:
            pass

    entry.update(parsed_details)

I appreciate your help on this matter. Please guide me. Thank you

chenoi
  • 575
  • 3
  • 8
  • 30
  • 1
    Dictionary does not support duplicate key. However, if you need to use dictionary, here is a question that have some answers that provides workarounds. https://stackoverflow.com/questions/10664856/make-a-dictionary-with-duplicate-keys-in-python – SSC Apr 02 '20 at 05:00

1 Answers1

1

Edit: I wrote this up pretty late last night, and came back to make a few edits.

In this case, you can use some simple string manipulation to do what you are hoping to do. I made some edits to your original code and have highlighted the differences in the comments of the code.

import json

myfile = 'sample.json'

with open(myfile, 'r') as f:
   mydata = json.load(f)

for entry in mydata['response']['client_log']['data']:
    my_keys = []
    my_values = []
    for i, line in enumerate(entry['details'].split('\r\n')):
        # If we find ": " in the line, then it contains a key, value pair
        if ": " in line:
            # Strip the line of whitespace and "," and then split it on ":"
            line = line.strip().strip(",").split(":")
            # Add the key to the keys array, and add the value to the values
            my_keys.append(line[0].replace(" ", "_"))
            my_values.append(line[1].strip())
    # Set an increment variable
    inc = 1
    parsed_details = {}
    key_str = ""
    # For each key and value in the keys and values
    for key, value in zip(my_keys, my_values):
        # If their are duplicate keys of the given key
        if my_keys.count(key) > 1:
            # Create a key_str to add onto the key
            key_str = "_{}".format(inc)
            key = key + key_str
            # If this key exists, increment the counter by 1
            if key in list(parsed_details.keys()):
                inc += 1
                # Strip the old key_str and add the new one
                key = key.strip(key_str)+ "_{}".format(inc)
        parsed_details[key] = value
    entry.update(parsed_details)
print(mydata)
Tristen
  • 362
  • 1
  • 18
  • Hi sir... it return the output dictionary as {}{} where it should be return the multiple dictionary as {},{} with a comma between each dictionary. – chenoi Apr 02 '20 at 12:42
  • @Tristen Thanks for contributing to Stack Overflow. Your answer will be more helpful if you explain in words how it solves the OPs question. – Code-Apprentice Apr 02 '20 at 14:04
  • @chenoi I made a small edit. I was printing out the entries individually so that you could see them as you had shown. I removed print(entry) and instead at the end put print(mydata) This code should do what you are asking for, and if not, could you perhaps offer more clarity as to what the outputs should be? – Tristen Apr 02 '20 at 19:18
  • Hi sir...thanks for the clarification and i have change accordingly..i just noticed it. I need to test it with few other data json. I will revert back. thanks – chenoi Apr 02 '20 at 23:19
  • The code above will check all the keys in the dict right? if any of the key appear more thn 1 it will rename accordingly....becoz the key that may duplicates is only Serv_Id, Prod_Num, Prod_Unit, Prod_Type.... – chenoi Apr 02 '20 at 23:31
  • That's correct. It checks every key, and it looks and sees if any key appears more than one time, if it does appear more than one time, it will name the first one with a _1 and the second with a _2 and so on. If their are no duplicates, it will just keep the key with the same name that it had originally. – Tristen Apr 02 '20 at 23:33
  • 1
    noted and thank you sir. I need to test it with actual data.... i will update accordingly. Your code works like a charm...thanks – chenoi Apr 03 '20 at 00:05