1

from a Dataframe, I want to have a JSON output file with one key having a list:

Expected output:

[
  {
    "model": "xx",
    "id": 1,
    "name": "xyz",
    "categories": [1,2],
  },
  {
    ...
  },
]

What I have:

[
  {
    "model": "xx",
    "id": 1,
    "name": "xyz",
    "categories": "1,2",
  },
  {
    ...
  },
]

The actual code is :

df = pd.read_excel('data_threated.xlsx')
result = df.reset_index(drop=True).to_json("output_json.json", orient='records')
parsed = json.dumps(result)

jsonfile = open("output_json.json", 'r')
data = json.load(jsonfile)

How can I achive this easily?

EDIT:

print(df['categories'].unique().tolist())

['1,2,3', 1, nan, '1,2,3,6', 9, 8, 11, 4, 5, 2, '1,2,3,4,5,6,7,8,9']
Minseven
  • 122
  • 1
  • 7

1 Answers1

1

You can use:

df = pd.read_excel('data_threated.xlsx').reset_index(drop=True)
df['categories'] = df['categories'].apply(lambda x: [int(i) for i in x.split(',')] if isinstance(x, str) else '')
df.to_json('output.json', orient='records', indent=4)

Content of output.json

[
    {
        "model":"xx",
        "id":1,
        "name":"xyz",
        "categories":[
            1,
            2
        ]
    }
]

Note you can also use:

df['categories'] = pd.eval(df['categories'])
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • I have an error AttributeError: 'int' object has no attribute 'split' Maybe it's due to the inconsistent data in catagories: I may have - "1,2,3" - "1" (this one is treated like an integer) - "" (empty) – Minseven Feb 04 '22 at 15:49
  • Ok. I will fixed it – Corralien Feb 04 '22 at 15:50
  • Do you test the second solution with `pd.eval`? – Corralien Feb 04 '22 at 15:51
  • yes, returning an error due to empty; pandas.core.computation.ops.UndefinedVariableError: name 'nan' is not defined – Minseven Feb 04 '22 at 15:54
  • Just `from numpy import nan` and retry :) – Corralien Feb 04 '22 at 15:55
  • Another error :D AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis' It seems to be buggy: https://stackoverflow.com/questions/48008191/attributeerror-pandasexprvisitor-object-has-no-attribute-visit-ellipsis-us/48008192#48008192 – Minseven Feb 04 '22 at 15:59
  • 1
    Ok you have more 100 records in your dataframe so `pd.eval` is useless here – Corralien Feb 04 '22 at 16:00
  • 1
    Can you update your post with the output of `df['categories'].unique().tolist()` please? – Corralien Feb 04 '22 at 16:03
  • Post Edited. Now I have added df = df.replace(np.nan, '', regex=True) error: AttributeError: 'int' object has no attribute 'split' On line: df['categories'] = df['categories'].apply(lambda x: [int(i) for i in x.split(',')]) – Minseven Feb 04 '22 at 16:10
  • Can you use `[int(i) for i in x.split(',')] if isinstance(str, x) else ''` inside `apply` please? – Corralien Feb 04 '22 at 16:59
  • 1
    Still not working, really struggling here :/ df['categories'] = df['categories'].apply(lambda x: [int(i) for i in x.split(',')] if isinstance(str, x) else '') TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union – Minseven Feb 04 '22 at 18:39