0

I am curious to know how I can read this JSON file into a Pandas dataframe and set new headers as my source does not have any. I am trying to get date, street, suburb as the headers.

As an example. Kent Street is the suburb and Karawara is the suburb

{
    "25 March 2019": {
        "Albany Highway": ["Maddington", "Cannington"],
        "Kent Street": ["Karawara"],
        "Kitchener Road": ["Alfred Cove"],
        "Alexander Road": ["Rivervale"],
        "Kwinana Freeway": ["Wellard"],
    },
    "26 March 2019": {
        "Great Eastern Highway": ["Sawyers Valley", "Redcliffe"],
        "South Western Highway": ["Armadale", "Wungong"],
        "Great Northern Highway": ["Muchea", "Baskerville"],
        "St Thomas Primary": ["Claremont"],
        "Stirling Highway": ["Claremont"],
        "Grovelands Primary": ["Camillo"],
        "Swan View Senior High": ["Swan View"],
    }
}

expect output would be something like;

{
    {
        "date": "25 March 2019",
        "street": "Kent Street"
        "suburb": "Karawara"
    }, {
        "date": "26 March 2019",
        "street": "St Thomas Primary"
        "suburb": "Claremont"
    }
}

Rules First value is always street. Second value is suburb. Some cases there are two suburb. Idealy we would have two rows but if not its mine leaving them as one.

I found similar questions like Pandas read nested json but couldn't find any examples where the json file simply had zero headers.

AshHimself
  • 4,024
  • 1
  • 21
  • 26

1 Answers1

3

if I understand correctly you want the following:

first of all, read the Json file and transform it into a Dictionary

import json

 with open('<yourFile>.json', 'r') as JSON:
        json_dict = json.load(JSON)

Then, i suppose you have this:

x={
    "25 March 2019": {
        "Albany Highway": ["Maddington", "Cannington"],
        "Kent Street": ["Karawara"],
        "Kitchener Road": ["Alfred Cove"],
        "Alexander Road": ["Rivervale"],
        "Kwinana Freeway": ["Wellard"],
    },
    "26 March 2019": {
        "Great Eastern Highway": ["Sawyers Valley", "Redcliffe"],
        "South Western Highway": ["Armadale", "Wungong"],
        "Great Northern Highway": ["Muchea", "Baskerville"],
        "St Thomas Primary": ["Claremont"],
        "Stirling Highway": ["Claremont"],
        "Grovelands Primary": ["Camillo"],
        "Swan View Senior High": ["Swan View"],
    }
}

You can do this:

df=pd.DataFrame([(j,z,h) for i in x.values() for j in x.keys() for h,z in i.items()],columns=['Date','suburb','street'])

print(df)

             Date                       suburb                  street
0   25 March 2019     [Maddington, Cannington]          Albany Highway
1   25 March 2019                   [Karawara]             Kent Street
2   25 March 2019                [Alfred Cove]          Kitchener Road
3   25 March 2019                  [Rivervale]          Alexander Road
4   25 March 2019                    [Wellard]         Kwinana Freeway
5   26 March 2019     [Maddington, Cannington]          Albany Highway
6   26 March 2019                   [Karawara]             Kent Street
7   26 March 2019                [Alfred Cove]          Kitchener Road
8   26 March 2019                  [Rivervale]          Alexander Road
9   26 March 2019                    [Wellard]         Kwinana Freeway
10  25 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
11  25 March 2019          [Armadale, Wungong]   South Western Highway
12  25 March 2019        [Muchea, Baskerville]  Great Northern Highway
13  25 March 2019                  [Claremont]       St Thomas Primary
14  25 March 2019                  [Claremont]        Stirling Highway
15  25 March 2019                    [Camillo]      Grovelands Primary
16  25 March 2019                  [Swan View]   Swan View Senior High
17  26 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
18  26 March 2019          [Armadale, Wungong]   South Western Highway
19  26 March 2019        [Muchea, Baskerville]  Great Northern Highway
20  26 March 2019                  [Claremont]       St Thomas Primary
21  26 March 2019                  [Claremont]        Stirling Highway
22  26 March 2019                    [Camillo]      Grovelands Primary
23  26 March 2019                  [Swan View]   Swan View Senior High

Alternatively, you can do this like this:

dic=[{'date':j,'street':z,'suburb':h} for i in x.values() for j in x.keys() for h,z in i.items()]

dic

[{'date': '25 March 2019',
  'street': ['Maddington', 'Cannington'],
  'suburb': 'Albany Highway'},
 {'date': '25 March 2019', 'street': ['Karawara'], 'suburb': 'Kent Street'},
 {'date': '25 March 2019',
  'street': ['Alfred Cove'],
  'suburb': 'Kitchener Road'},
 {'date': '25 March 2019',
  'street': ['Rivervale'],
  'suburb': 'Alexander Road'},
 {'date': '25 March 2019', 'street': ['Wellard'], 'suburb': 'Kwinana Freeway'},
 {'date': '26 March 2019',
  'street': ['Maddington', 'Cannington'],
  'suburb': 'Albany Highway'},
 {'date': '26 March 2019', 'street': ['Karawara'], 'suburb': 'Kent Street'},
 {'date': '26 March 2019',
  'street': ['Alfred Cove'],
  'suburb': 'Kitchener Road'},
 {'date': '26 March 2019',
  'street': ['Rivervale'],
  'suburb': 'Alexander Road'}

...

as lists of dictionaries. Now you can transform it into dataframe like this:

df=pd.DataFrame(d)

             date                       street                  suburb
0   25 March 2019     [Maddington, Cannington]          Albany Highway
1   25 March 2019                   [Karawara]             Kent Street
2   25 March 2019                [Alfred Cove]          Kitchener Road
3   25 March 2019                  [Rivervale]          Alexander Road
4   25 March 2019                    [Wellard]         Kwinana Freeway
5   26 March 2019     [Maddington, Cannington]          Albany Highway
6   26 March 2019                   [Karawara]             Kent Street
7   26 March 2019                [Alfred Cove]          Kitchener Road
8   26 March 2019                  [Rivervale]          Alexander Road
9   26 March 2019                    [Wellard]         Kwinana Freeway
10  25 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
11  25 March 2019          [Armadale, Wungong]   South Western Highway
12  25 March 2019        [Muchea, Baskerville]  Great Northern Highway
13  25 March 2019                  [Claremont]       St Thomas Primary
14  25 March 2019                  [Claremont]        Stirling Highway
15  25 March 2019                    [Camillo]      Grovelands Primary
16  25 March 2019                  [Swan View]   Swan View Senior High
17  26 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
18  26 March 2019          [Armadale, Wungong]   South Western Highway
19  26 March 2019        [Muchea, Baskerville]  Great Northern Highway
20  26 March 2019                  [Claremont]       St Thomas Primary
21  26 March 2019                  [Claremont]        Stirling Highway
22  26 March 2019                    [Camillo]      Grovelands Primary
23  26 March 2019                  [Swan View]   Swan View Senior High
Billy Bonaros
  • 1,671
  • 11
  • 18