2

I know this is has been asked many times but I still have no solution to my problem and I'm not that good at programming so basically I have so many json files but the format of the json is very different so I can't get the output that I want with pandas library , here is an example of the json file .

[
    {
        "Names": [
            "James",
            "Bob",
            "David"
        ],
        "Salary": [
            "2000$",
            "5000$",
            "6000$"
        ],
        "Id": [
            "1",
            "2",
            "3"
        ]
    },
    {
        "Names": [
            "John",
            "Charles",
            "Harry"
        ],
        "Salary": [
            "1000$",
            "2000$",
            "3000$"
        ],
        "Id": [
            "4",
            "5",
            "6"
        ]
    }
]

when I did convert this with pandas using this simple code:

import json
import pandas as pd

df=pd.read_json("test.json")
df.to_csv("results.csv")

https://i.stack.imgur.com/jPlQx.png

the problem is the output of csv file give me all the 3 names in one single cell just like that ['James', 'Bob', 'David'] , ['2000$', '5000$', '6000$'] ... but I want only one name in the cell not all 3 of them I'm very new to these stuff I will appreciate any help

Rei M
  • 25
  • 2

5 Answers5

1

Yes, you are getting that answer because each value for the given key in the dictionary contains a list and not a single element. Therefore there are two levels you should address when working with that kind of json format.

data is a list that contains two dictionaries with the same keys each. Either dictionary contains keys that contain a list. Therefore we need to iterate over the first list (to address each dictionary) and then over the second to adress each value for any specified key. The output will be the table as you desire. It should be noted that this code will work as length of the list values of "Names" is the same as "Salary" and "Id".

import pandas as pd 
import numpy as np
data = [
    {
        "Names": ["James","Bob","David"],
        "Salary": ["2000$","5000$","6000$"],
        "Id": ["1","2","3"]},
    {
        "Names": ["John","Charles","Harry"],
        "Salary": ["1000$","2000$","3000$"],
        "Id": ["4","5","6"]}
]
to_df = {'Names':[],'Salary':[],'Id':[]}
for i in range(len(data)):
    for j in range(len(data[i]['Id'])):
        to_df['Names'].append(data[i]['Names'][j])
        to_df['Salary'].append(data[i]['Salary'][j])
        to_df['Id'].append(data[i]['Id'][j])
df = pd.DataFrame(to_df)
print(df)

Output:

     Names Salary Id
0    James  2000$  1
1      Bob  5000$  2
2    David  6000$  3
3     John  1000$  4
4  Charles  2000$  5
5    Harry  3000$  6
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • 1
    Thank you , your code is working like a charm but one last thing the length of the data is 2 which is right and the length data[i] is 3 the problem when I add more names "Names": ["James","Bob","David","Bob","David"] for some reason your code will only loop the first 3 value and ignore the rest what if I have more than three in each of the "Names" – Rei M Oct 07 '19 at 14:01
  • There you go, it should work fine now. But this only works if the amount of "Names" is equal to the amount of "Id" and "Salary". – Celius Stingher Oct 07 '19 at 14:06
  • 1
    Thank you it's working just as you said this exactly what I was looking for however when I did this data = [pd.read_json("data.json")] it did give the same problem as before lol and I just noticed in the data you didn't add [ at first so I tried to delete it in the json file and it didn't work – Rei M Oct 07 '19 at 15:38
  • I'm glad to help, you can check some of my other answers regarding how to deal with Jsons in pandas! https://stackoverflow.com/questions/57875259/how-to-extract-nested-json-data/57875343#57875343 – Celius Stingher Oct 07 '19 at 15:41
0
a = [
    {
        "Names": [
            "James",
            "Bob",
            "David"
        ],
        "Salary": [
            "2000$",
            "5000$",
            "6000$"
        ],
        "Id": [
            "1",
            "2",
            "3"
        ]
    },
    {
        "Names": [
            "John",
            "Charles",
            "Harry"
        ],
        "Salary": [
            "1000$",
            "2000$",
            "3000$"
        ],
        "Id": [
            "4",
            "5",
            "6"
        ]
    }
]

I think this might solve your problem:

col_names = [k for k,v in a[0].items()]
frames = [pd.io.json.json_normalize(a, str(col)) for col in col_names]
final_df = pd.concat(frames, axis = 1)
final_df.columns = col_names

Output: '

  Id Salary    Names
0  1  2000$    James
1  2  5000$      Bob
2  3  6000$    David
3  4  1000$     John
4  5  2000$  Charles
5  6  3000$    Harry
naive
  • 367
  • 1
  • 3
  • 9
0

The problem isn't in the function but in the way the json is defined. The pandas output is therefore exactly as it should be.

Instead of reading it in differently you could simply format your dataframe further to show the output you want. At the moment each row for each column is a list, so you need to unnest:

import json
import pandas as pd
import numpy as np

df=pd.read_json('data.json')

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

unnesting(df,['Names','Salary','Id'])

The self-made unnesting function comes thanks to WeNYoBen

Fnguyen
  • 1,159
  • 10
  • 23
0

Your data input isn't structured properly for the output you wanted, you'll need to fix that first:

import json

with open('test.json', 'r') as file:
    data = json.load(file.read())

# flatten the data to {'Name': [...], 'Salary': [...], ...}
d = {}
for elem in data:
    for k, v in elem.items():
        d.setdefault(k, []).extend(v)

df = pd.DataFrame(d).reindex(columns=['Id', 'Names', 'Salary'])

Result:

  Id    Names Salary
0  1    James  2000$
1  2      Bob  5000$
2  3    David  6000$
3  4     John  1000$
4  5  Charles  2000$
5  6    Harry  3000$
r.ook
  • 13,466
  • 2
  • 22
  • 39
0

I think you want each row to have the id, name and salary. You can achieve this as follows:

import pandas as pd

df=pd.read_json("test.json")

new_df = pd.DataFrame(columns=['id', 'name', 'salary'])
for _, row in df.iterrows():
    new_df = new_df.append(pd.DataFrame(
        {'id': row.Id, 'name': row.Names, 'salary': row.Salary}))

new_df.to_csv("results.csv")

result in results.csv

,id,name,salary
0,1,James,2000$
1,2,Bob,5000$
2,3,David,6000$
0,4,John,1000$
1,5,Charles,2000$
2,6,Harry,3000$

Basically the initial dataframe df has the id, names and salary data in lists, so what you do is to make a new dataframe new_df, then loop over dataframe df and append to new_df the dataframe (with same structure as df_new) that has the values of the rows properly under each column.

This will work no matter how long the lists in rows are as long as they are the same for Id, Names and Salary...

Bruno Vermeulen
  • 2,970
  • 2
  • 15
  • 29