1

I need to parse this response into a pandas dataframe.

  "predictions": [
    {
        "predicted_label": 4.0,
        "distances": [3.11792408, 3.89746071, 6.32548437],
        "labels": [0.0, 1.0, 0.0]
    },
    {
        "predicted_label": 2.0,
        "distances": [1.08470316, 3.04917915, 5.25393973],
        "labels": [2.0, 2.0, 0.0]
    }
  ]

The end result i am looking for is:

predicted_label distances labels
0 4.0 3.11792408 0.0.
1 4.0 3.89746071 1.0
2. 4.0. 6.32548437 0.0

same for the second predicted_label 2.0.

I tried using:

pd.json_normalize(result['predictions'], record_path='distances', meta='predicted_label', record_prefix='dist_')

but that will not give me the labels column

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57

2 Answers2

2

I am assuming result takes the following format:

result = {"predictions": [{"predicted_label": 4.0,"distances": [3.11792408, 3.89746071, 6.32548437],"labels": [0.0, 1.0, 0.0]},{"predicted_label": 2.0,"distances": [1.08470316, 3.04917915, 5.25393973],"labels": [2.0, 2.0, 0.0]}]}

If you pass results['prediction'] to pd.DataFrame, you will get some rows that are lists because "predicted_label" is length 1, while "distances" and "labels" are length 3:

>>> pd.DataFrame(result['predictions'])
   predicted_label                             distances           labels
0              4.0  [3.11792408, 3.89746071, 6.32548437]  [0.0, 1.0, 0.0]
1              2.0  [1.08470316, 3.04917915, 5.25393973]  [2.0, 2.0, 0.0]

To get around this, we can then set predicted_label to be the index, then apply pd.Series.explode to the other columns (credit goes to @yatu's answer here), before resetting the index. Since they are lists, they are of type dobject, so we can use applymap to change everything to type float.

Set the formatting to 8 digits after the decimal: pd.options.display.float_format = "{:.8f}".format

>>> pd.DataFrame(result['predictions']).set_index('predicted_label').apply(pd.Series.explode).reset_index().applymap(lambda x: float(x))

   predicted_label  distances     labels
0       4.00000000 3.11792408 0.00000000
1       4.00000000 3.89746071 1.00000000
2       4.00000000 6.32548437 0.00000000
3       2.00000000 1.08470316 2.00000000
4       2.00000000 3.04917915 2.00000000
5       2.00000000 5.25393973 0.00000000
Derek O
  • 16,770
  • 4
  • 24
  • 43
  • 1
    FWIW the `set_index` part is unnecessary explode handles "unexplodeable" elements quite well `pd.DataFrame(result['predictions']).apply(pd.Series.explode).reset_index(drop=True)` – Henry Ecker Jun 12 '21 at 20:40
  • one problem with the answer is that the returned numbers are now objects. they also have fewer digits. 3.11792408 vs 3.117924. that prevents the next "join" step that i am trying to accomplish – PRAEMERE LLC Jun 12 '21 at 21:11
  • That's quite peculiar - let me look into this and see if casting as a string then back to float might be a workaround – Derek O Jun 12 '21 at 21:16
  • The number of digits are not actually fewer: you can control the formatting with `pd.options.display.float_format = "{:.8f}".format` which will display all 8 digits after the decimal, for example – Derek O Jun 12 '21 at 21:25
  • I updated the answer - hopefully this helps! – Derek O Jun 12 '21 at 21:29
1

the response seems like a bunch of records you could parse them one by one, then concat it together:

df = []
for dd in response['predictions']:
    df.append(pd.DataFrame(dd))
df = pd.concat(df).reset_index(drop=True) # reset_index if needed.
Derek O
  • 16,770
  • 4
  • 24
  • 43
SCKU
  • 783
  • 9
  • 14