1

Dataframe 1 is inside a function and Dataframe 2 is in main function when I run main function, it appends the values from a JSON and stores in result_df and getting ValueError: cannot set a frame with no defined columns error

Why I am creating Dataframe2 in main function? I am using Dataframe2 (total_device_df) in other functions to convert to csv.

Reproducible code:

import pandas as pd
import os
import json

class KatsRequest:

currDir = os.getcwd()
    def parse_json_response():

        filename = "my_json_file.json"
        device_name = ["Trona", "Sheldon"]
        "creating dataframe to store result"
        column_names = ["DEVICE", "STATUS", "LAST UPDATED"]
        result_df = pd.DataFrame(columns=column_names)
        my_json_file = currDir + '/' + filename

        for i in range(len(device_name)):
            my_device_name = device_name[i]
            with open(my_json_file) as f:
                data = json.load(f)

            for devices in data:
                device_types = devices['device_types']
                if my_device_name in device_types['name']:
                    if device_types['name'] == my_device_name:
                        device = devices['device_types']['name']
                        last_updated = devices['devices']['last_status_update']
                        device_status = devices['devices']['status']

                        result_df.loc[len(result_df)] = {'DEVICE': device, 'STATUS': device_status, 'LAST UPDATED': last_updated}
        return result_df

def main()
    total_device_df = pd.DataFrame()
    total_device_df.loc[len(total_device_df)] = KatsRequest().parse_json_response(filename, device_names)

if __name__ == '__main__':
    main()

Here is my JSON file contents: (save in your current path named as "my_json_file.json")

[{"devices": {"id": 34815, "last_status_update": "2023-05-25 07:56:49", "status": "idle" }, "device_types": {"name": "Trona"}}, {"devices": {"id": 34815, "last_status_update": "2023-05-25 07:56:49", "status": "idle" }, "device_types": {"name": "Sheldon"}}]

Output: ValueError: cannot set a frame with no defined columns

What is missing/wrong here?

MDI
  • 53
  • 2
  • 8

1 Answers1

1

(Your example is not reproducible)

You have to use pd.concat if you want to use merge total_device_df and the dataframe returned by parse_json_response.

def main():
    total_device_df = pd.DataFrame()
    total_device_df = pd.concat([total_device_df, KatsRequest().parse_json_response(filename, device_names)])

Or simply:

def main():
    total_device_df = pd.DataFrame()
    total_device_df = KatsRequest().parse_json_response(filename, device_names)

Full example:

import os
import json
import pandas as pd

currDir = os.getcwd()
filename = 'my_json_file.json'
device_names = ['Trona', 'Sheldon']

class KatsRequest:
    def parse_json_response(self, filename, device_name):

        # creating dataframe to store result
        column_names = ["DEVICE", "STATUS", "LAST UPDATED"]
        result_df = pd.DataFrame(columns=column_names)
        my_json_file = currDir + '/' + filename

        for i in range(len(device_name)):
            my_device_name = device_name[i]
            with open(my_json_file) as f:
                data = json.load(f)

            for devices in data:
                device_types = devices['device_types']
                if my_device_name in device_types['name']:
                    if device_types['name'] == my_device_name:
                        device = devices['device_types']['name']
                        last_updated = devices['devices']['last_status_update']
                        device_status = devices['devices']['status']

                        result_df.loc[len(result_df)] = {'DEVICE': device, 'STATUS': device_status, 'LAST UPDATED': last_updated}
        return result_df

def main():
    total_device_df = pd.DataFrame()
    total_device_df = pd.concat([total_device_df, KatsRequest().parse_json_response(filename, device_names)])
    print(total_device_df)

if __name__ == '__main__':
    main()

Output:

>>> total_device_df
    DEVICE STATUS         LAST UPDATED
0    Trona   idle  2023-05-25 07:56:49
1  Sheldon   idle  2023-05-25 07:56:49

Update:

A painless way is to use pd.json_normalize:

with open('my_json_file.json') as jp:
    data = json.load(jp)
total_device_df = (pd.json_normalize(data).loc[lambda x: x['device_types.name'].isin(device_names)]
                     .rename(columns=lambda x: x.split('.', maxsplit=1)[-1].upper()))

Output:

>>> total_device_df
      ID   LAST_STATUS_UPDATE STATUS     NAME
0  34815  2023-05-25 07:56:49   idle    Trona
1  34815  2023-05-25 07:56:49   idle  Sheldon

How to use pd.json_normalize method in full example code?

import os
import json
import pandas as pd

currDir = os.getcwd()
filename = 'my_json_file.json'
device_names = ['Trona', 'Sheldon']

class KatsRequest:
    def parse_json_response(self, filename, device_name):

        # creating dataframe to store result
        dmap = {
            'device_types.name': 'DEVICE',
            'devices.status': 'STATUS',
            'devices.last_status_update': 'LAST STATUS'
        }
        my_json_file = currDir + '/' + filename

        with open(my_json_file) as f:
            data = json.load(f)
        results_df = (pd.json_normalize(data)[dmap.keys()].rename(columns=dmap)
                        .loc[lambda x: x['DEVICE'].isin(device_names)])
        return results_df

def main():
    total_device_df = pd.DataFrame()
    total_device_df = pd.concat([total_device_df, KatsRequest().parse_json_response(filename, device_names)])
    print(total_device_df)

if __name__ == '__main__':
    main()
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • The problem is I am using pandas >2.0 and hence according to https://stackoverflow.com/questions/75956209/dataframe-object-has-no-attribute-append/75956237#75956237 concat and append has been deprecated and i guess thats the reason – MDI May 26 '23 at 06:36
  • `concat` is not deprecated! `append` has been removed! – Corralien May 26 '23 at 06:37
  • I tried, but below error were thrown, ` self.device_dsn = devices['devices']['dsn'] ~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: 'dsn'` any idea, why? – MDI May 26 '23 at 06:39
  • Because there is no key dsn in your json? – Corralien May 26 '23 at 06:40
  • there is dsn key in my json, also 'dsn' is displayed in this print statement `print(total_device_df)` along with other details. but what could be the reason for this error to show up – MDI May 26 '23 at 06:42
  • Try to use `pd.json_normalize`, it could be easier to debug. – Corralien May 26 '23 at 06:48
  • can you please update how to use pd.json_normalize method in full example code? – MDI May 26 '23 at 06:53
  • I updated my answer according your request. Can you check it please? – Corralien May 26 '23 at 07:05
  • how can we ignore index in this? result_df.loc[len(result_df)] = {'DEVICE': device, 'STATUS': device_status, 'LAST UPDATED': last_updated} could that be the reason for this error? `self.device_dsn = devices['devices']['dsn'] ~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: 'dsn'` – MDI May 26 '23 at 09:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253837/discussion-between-mdi-and-corralien). – MDI May 26 '23 at 09:20