1

i have a json that looks like this:

{
  "course1": [
    {
      "courseName": "test",
      "section": "123",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course2": [
    {
      "courseName": "test",
      "section": "456",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course2": [
    {
      "courseName": "test",
      "section": "789",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course2": [
    {
      "courseName": "test",
      "section": "1011",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course3": [
    {
      "courseName": "test",
      "section": "1213",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course3": [
    {
      "courseName": "test",
      "section": "1415",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ]
}

and i want to combine any block/object/list (i don't know what it called), that they have the same key value. like this:

{
  "course1": [
    {
      "courseName": "test",
      "section": "123",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course2": [
    {
      "courseName": "test",
      "section": "456",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    },
    {
      "courseName": "test",
      "section": "789",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    },
    {
      "courseName": "test",
      "section": "1011",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ],
  "course3": [
    {
      "courseName": "test",
      "section": "1213",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    },
    {
      "courseName": "test",
      "section": "1415",
      "academicHours": "3",
      "day1": "1",
      "room1": "0145 03 1 B 015"
    }
  ]
}

how can i do this using regular expression in python? or any regular expression query?

also, i tried to use json.dumps() and work my way from there but for some reason when i use it with any json that contains Arabic characters it freaks out and messes up the whole thing. so i'm stuck with regular expression unfortunately.

and thank you for your help :)

Barmar
  • 741,623
  • 53
  • 500
  • 612
Waleed Alenazi
  • 386
  • 1
  • 4
  • 14
  • 3
    The first object is impossible, an object can't have duplicate keys. Are you sure it's not an array of objects? – Barmar Jan 25 '19 at 00:52
  • @Barmar not impossible.. just bad json – wim Jan 25 '19 at 01:21
  • 1
    @wim I meant that the JSON doesn't correspond to a possible object. – Barmar Jan 25 '19 at 01:30
  • 2
    I know what you mean, but you're wrong. [Does JSON syntax allow duplicate keys in an object?](https://stackoverflow.com/q/21832701/674039) – wim Jan 25 '19 at 01:32
  • @Barmar i made this json with a function that converts some data from a pdf file to json, so that's why it's a bad json. that's why i asked this question because i want to fix it. – Waleed Alenazi Jan 25 '19 at 11:53
  • No JSON library should create invalid JSON. That's why I always tell people to use libraries to create JSON rather than trying to build it themselves using string operations. – Barmar Jan 25 '19 at 18:03
  • @Barmar i didn't find any library to create a json out of a pdf & local html table, i found one that takes a url only, not local – Waleed Alenazi Jan 26 '19 at 01:55
  • You should make a dictionary or list from the PDF, then use `json.dump()` on that. – Barmar Jan 26 '19 at 01:56

1 Answers1

3

stdlib json offers a hook to allow decoding objects with duplicate keys. This simple "extend" hook should work for your example data:

def myhook(pairs):
    d = {}
    for k, v in pairs:
        if k not in d:
          d[k] = v
        else:
          d[k] += v
    return d

mydata = json.loads(bad_json, object_pairs_hook=myhook)

Although there's nothing in the JSON specification to disallow duplicate keys, it SHOULD probably be avoided in the first place:

1.1. Conventions Used in This Document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

...

  1. Objects

An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.

Community
  • 1
  • 1
wim
  • 338,267
  • 99
  • 616
  • 750
  • it did work, thank you so much. but the problem that i can't save it as a string only if i used json.dumps(). and as i said that will mess up any json that contains Arabic characters. i guess i'll have to figure something out. thanks – Waleed Alenazi Jan 25 '19 at 12:01
  • Pass `ensure_ascii=False` when you call `json.dumps`. – wim Jan 25 '19 at 16:45