12

I went through the entire documentation of Polars but couldn't find anything which could convert nested json into dataframe.

test = {
  "name": "Ravi",
  "Subjects": {
    "Maths": 92,
    "English": 94,
    "Hindi": 98
  }
}

json_normalize in pandas would convert this to a dataframe by naming the columns as name, Subjects.Maths, Subjects.English and Subjects.Hindi. So is this a possibility in Polars? I did try all the functions but it always throws an error as it doesn't undersand the nested structure.

Shikha Sheoran
  • 121
  • 1
  • 4
  • 3
    There is no json_normalize in Polars like there is in Pandas. You may want to use another package to flatten the dict first, such as https://pypi.org/project/json-flatten/ – jvz Nov 21 '21 at 08:49
  • AFAIK the arrow2 crate doesn't deal with nested structs very well – Moriarty Snarly Mar 28 '22 at 08:46

2 Answers2

7

For a simple JSON-like dictionary, you can use a comprehension list to convert the values into list of values.

Below an example:

grades = {
  "name": "Ravi",
  "Subjects": {
    "Maths": 92,
    "English": 94,
    "Hindi": 98
  }}

grades_with_list = {key:[value] for key, value in grades.items()}
pl.DataFrame(grades_with_list)

# Output
shape: (1, 2)
┌──────┬────────────┐
│ name ┆ Subjects   │
│ ---  ┆ ---        │
│ str  ┆ struct[3]  │
╞══════╪════════════╡
│ Ravi ┆ {92,94,98} │
└──────┴────────────┘

# You can also un-nest the Subjets column, to get a separate column for each subject.

pl.DataFrame(grades_with_list).unnest('Subjects')

# Output
shape: (1, 4)
┌──────┬───────┬─────────┬───────┐
│ name ┆ Maths ┆ English ┆ Hindi │
│ ---  ┆ ---   ┆ ---     ┆ ---   │
│ str  ┆ i64   ┆ i64     ┆ i64   │
╞══════╪═══════╪═════════╪═══════╡
│ Ravi ┆ 92    ┆ 94      ┆ 98    │
└──────┴───────┴─────────┴───────┘

Luca
  • 1,216
  • 6
  • 10
  • See more examples of `unnest` [here](https://github.com/pola-rs/polars/issues/7078#issuecomment-1441905809) also see the new `json_extract` [here](https://github.com/pola-rs/polars/issues/7222) (instead of the comprehension) – ecoe Apr 17 '23 at 19:32
3

No there is not, AFAIK it is recommended to use the pandas function and then load the pandas data frame into polars.

Moritz Wilksch
  • 141
  • 2
  • 5
  • I think the OP's main goal would be avoiding using Pandas. Maybe volumes are too large to use it. – Seb Jun 16 '23 at 14:14