I'm currently accessing a JSON file to gather information about companies, branches and workers. In this JSON there are multiple companies, in which every company has its own branches and each branch has its own workers. Below there's an example of the JSON structure I'm working on.
{
"company": [
{
"companyName": "MyCompany",
"branches": [
{
"branchName": "My First Branch",
"workers": [
{
"workerName": "John",
"wage": "10"
},
{
"workerName": "Lucas",
"wage": "20"
}
]
},
{
"branchName": "My Second Branch",
"workers": [
{
"workerName": "Mary",
"wage": "30"
},
{
"workerName": "Jack",
"wage": "40"
}
]
}
]
},
{
"companyName": "YourCompany",
"branches": [
{
"branchName": "Your First Branch",
"workers": [
{
"workerName": "George",
"wage": "15"
},
{
"workerName": "Harry",
"wage": "25"
}
]
},
{
"branchName": "Your Second Branch",
"workers": [
{
"workerName": "Wayne",
"wage": "35"
},
{
"workerName": "Rose",
"wage": "45"
}
]
}
]
}
]
}
My goal is to gather informations regarding all companies, branches and workers in one dataframe for each of them. For that, I'm currently using loops, as shown below.
Companies = my_json['Companies'] #accessing list of companies
Branches = None
Workers = None
for i in range(len(Companies)):
company_branches = Companies[i]['Branches'] #accessing branches for that company
if(Branches is None):
Branches = pd.DataFrame(company_branches)
else:
Branches = pd.concat([Branches,pd.DataFrame(company_branches)])
for j in range(len(company_branches)):
branch_workers = company_branches[j]['Workers'] #accessing workers for that branch
if(Workers is None):
Workers = pd.DataFrame(branch_workers)
else:
Workers = pd.concat([Workers,pd.DataFrame(branch_workers)])
Companies = pd.DataFrame(Companies)
This solves my problem, creating the three desired dataframes (with some extra columns for companies and branches which I would still drop), but I'm currently having performance issues. I've been trying to solve this problem without loops, but I can't create the dataframes correctly. If I try
Companies = pd.DataFrame(my_json['companies'])
it creates the dataframe correctly, but then if I try
Branches = pd.DataFrame(Companies.branches.values)
it isn't created correctly. It doesn't raise an error, but it's basically copying the column from the Companies dataframe, creating a single column with the JSON code for the branches for each company. In this case, what I wanted was as many columns as the amount of attributes the branches have.
Any tips on how to solve this problem in an efficient way?