0

I have the following JSON file:

{
  "date": "2021-07-08",
  "info": "success",
  "data": [
    {
      "customerid": "1",
      "site1": ["2021-07-08T20:56:16+02:00"]
    },
    {
      "customerid": "2",
      "isvip": true,
      "site1": ["2021-07-08T18:53:53+02:00"]
    },
    {
      "customerid": "3",
      "isvip": true,
      "site1": ["2021-07-08T13:29:04+02:00", "2021-07-08T13:29:02+02:00"],
      "site2": [ "2021-07-08T05:15:12+02:00", "2021-07-08T05:15:08+02:00"]
    }
  ]
}

And I would like to transform it into a SQL table for our database like this:

| customerid | isvip       | siteid | timestamp           |
| ---------- | ----------- | ------ | ------------------- |
| 1          | false       | site1  | 2021-07-08 20:56:16 |
| 2          | true        | site1  | 2021-07-08 18:53:53 |
| 3          | true        | site1  | 2021-07-08 13:29:04 |
| 3          | true        | site1  | 2021-07-08 13:29:02 |
| 3          | true        | site2  | 2021-07-08 05:15:12 |
| 3          | true        | site2  | 2021-07-08 05:15:08 |

There are several problems that I have to face:

1 - I don't care about the date and info in the beginning. They can be eliminated.

2 - The info about isvip only shows up if the value is true. Otherwise, it's false.

3 - Several sites can be visited. But they only appear if they actually have been visited. For example, customer 3 is the only one who visited site2.

4 - (optional) the isvip-field is metadata about the ids which could be put in a separate table. But it's just a binary field, so the use of storage capacity is not a big deal yet.

I haven't worked with JSON files that much yet. I would like to do the code either in SQL or Python (or both).

Thank you guys for your help!!!

Julio Nobre
  • 4,196
  • 3
  • 46
  • 49
cvluepke
  • 85
  • 2
  • 9
  • Your question needs more focus. You can start by loading your json into a dictionary: `with open("your_file.json") as f: your_dict = json.load(f)` – Tranbi Aug 12 '21 at 07:35

2 Answers2

0

You should proceed with data key to be converted to table.

Step1: Load JSON as dict


import json

with open("data.json") as file_handle:
  # Returns Dictionary
  dict_data = json.load(file_handle)

Step2: Get the data key values

useful_data = dict_data['data']

Step3: Convert the data to pandas and play or format the data as per the requirement

data_as_table = pd.DataFrame(useful_data)
data_as_table.to_csv("/path/to/save/file.csv")

Step4: Import this table as SQL database or directly save the dataframe to mysql, sometimes I prefer Adminer for this.

  • `with open("data.json") as file_handle: json_data = json.load(file_handle)` already return a dict – Netim Aug 12 '21 at 07:46
  • Thanks, I have updated the answer according to your suggestion! –  Aug 12 '21 at 07:51
0

You could try something like this

import json
import pandas as pd


with open('a.json', 'r') as f:
    json_data = json.load(f)


data = json_data['data']

formated_data = []
for entry in data:
    entry_list = []
    prefix_list = [entry.get('customerid'), True if entry.get('isvip') else False]
    for k in entry.keys():
        if 'site' in k:
            for timestamp in entry[k]:
                tmp = [k, timestamp]
                entry_list.append(prefix_list + tmp)
    formated_data.extend(entry_list)

print(formated_data)

df = pd.DataFrame(formated_data, columns=['customerid', 'isvip', 'siteid', 'timestamp'])
print(df)
Netim
  • 135
  • 7
  • Thank you Netim, that almost got me the result I needed. I needed to add a list with all prospective names of sites that the customers can visit (site_list = ['site1','site2']) and then add another "for site in site_list:" beneath your line "for k in entry.keys():" . After that, replace 'site' with site. Thank you so much! – cvluepke Aug 12 '21 at 08:09