-1

I have a JSON file which contains duplicate records and I need to send the unique records to another application. In order to get the unique records, first I need to merge JRNAL_NO & JRNAL_LINE fields separated a hyphen (ex: 655-1) and then use this key to identify the unique records. Is this possible to do via a python script?

Source

Target

Thank you, John

John
  • 1

2 Answers2

0

I managed to get the expected results using the below script (thanks to some old posts in SO). I am new to Python, so if there are better ways of doing this, kindly suggest. Thanks.

import json
source = json.loads(input_var)
target = []
seen = set()

for record in source:
    name = record['JRNAL_NO'] + record['JRNAL_LINE']
    if name not in seen:
        seen.add(name)
        target.append(record)
del seen

output_var = json.dumps(target)
John
  • 1
-1

In order to solve this problem, we can gather the dictionaries in a list and then apply a list-comprehension to eliminate duplicates:

dictA = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "1", "D_C": "C"}
dictB = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}
dictC = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}
dictD = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}

list_of_dicts = [dictA, dictB, dictC, dictD]

result = []
[result.append(x) for x in list_of_dicts if x not in result]

print(result)

This will return a list of the non-reapeted dictionaries:

[{'PERIOD': '2022007', 'JRNAL_NO': '655', 'JRNAL_LINE': '1', 'D_C': 'C'}, {'PERIOD': '2022007', 'JRNAL_NO': '655', 'JRNAL_LINE': '3', 'D_C': 'C'}]

Extra relevant links:

List Comprehension

  • 0. Don't answer questions that show no effort (since such questions are basically "implement this for me", which is [off-topic](/help/on-topic) here and answering encourages more) [Answer] 1. Don't use a list comprehension for [side-effects](/q/5753597/843953). 2. OP wants to check for duplicates using only `d['JRNAL_NO'] + "-" + d['JRNAL_LINE']`, not the entire `d`. 3. Your approach is very sub-optimal because it checks for membership in the `result` list which is O(N), wrapped in a loop so it makes your code O(N^2). It's much better to create a dict with those keys and the unique values. – Pranav Hosangadi May 04 '22 at 18:52