1

I've got a list of dictionaries of dictionaries... Basically, it is just big piece of JSON. Here how looks like one dict from a list:

{'id': 391257, 'from_id': -1, 'owner_id': -1, 'date': 1554998414, 'marked_as_ads': 0, 'post_type': 'post', 'text': 'Весна — время обновлений. Очищаем балконы от старых лыж и API от устаревших версий: уже скоро запросы к API c версией ниже 5.0 перестанут поддерживаться.\n\nОжидаемая дата изменений: 15 мая 2019 года. \n\nПодробности в Roadmap: https://vk.com/dev/version_update_2.0', 'post_source': {'type': 'vk'}, 'comments': {'count': 91, 'can_post': 1, 'groups_can_post': True}, 'likes': {'count': 182, 'user_likes': 0, 'can_like': 1, 'can_publish': 1}, 'reposts': {'count': 10, 'user_reposted': 0}, 'views': {'count': 63997}, 'is_favorite': False}

And I want to dump each dict to frame. if I just do

data = pandas.DataFrame(list_of_dicts)

I get a frame where are only two columns: first one contains keys, and another one contains data, like this: enter image description here

I tried doing it in a loop:

for i in list_of_dicts:
    tmp = pandas.DataFrame().from_dict(i)
    data = pandas.concat([data, tmp])
    print(i)

But I face ValueError:

Traceback (most recent call last):
  File "/home/keddad/PycharmProjects/vk_group_parse/Data Grabber.py", line 68, in <module>
    main()
  File "/home/keddad/PycharmProjects/vk_group_parse/Data Grabber.py", line 61, in main
    tmp = pandas.DataFrame().from_dict(i)
  File "/home/keddad/anaconda3/envs/vk_group_parse/lib/python3.7/site-packages/pandas/core/frame.py", line 1138, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/home/keddad/anaconda3/envs/vk_group_parse/lib/python3.7/site-packages/pandas/core/frame.py", line 392, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/home/keddad/anaconda3/envs/vk_group_parse/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 212, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/keddad/anaconda3/envs/vk_group_parse/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 51, in arrays_to_mgr
    index = extract_index(arrays)
  File "/home/keddad/anaconda3/envs/vk_group_parse/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 320, in extract_index
    raise ValueError('Mixing dicts with non-Series may lead to '
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

How, after this, I can get dataframe with one post (one dictionary in the list is one post) and all the data in it as columns?

keddad
  • 1,398
  • 3
  • 14
  • 35
  • 2
    Not a dupe, but you can take a look at [this answer](https://stackoverflow.com/a/53831756/4909087) for converting a list of dictionaries to a DataFrame. For dictionaries that are nested, use `json_normalize`. – cs95 Apr 19 '19 at 08:29
  • @cs95, thanks, how it is better. but "dicts in dicts" are still written as attachments. I still have piece of json in "attachments" column, for example. I'm looking for a way to make keys in these jsons normal columns too. Now I can iterate through column, parsing it and merging with frame, but this solution can't be named elegant. I'll try looking for a more elegant solution :D – keddad Apr 19 '19 at 08:51
  • As @cs95 said - don't you just want `df = pd.io.json.json_normalize(list_of_dicts)` ? It will flatten the json so each key:value becomes a column – danielR9 Apr 19 '19 at 09:01

2 Answers2

3

I can't figure out the df exactly but I think you simply need to do a reset_index and all the data which is currently(it seems):

df.reset_index(inplace=True)

Another thing if you want the keys as columns:

df = pd.Dataframe.from_dict(orient='columns')  
# or try `index` in columns if you don't get desired results

In a for loop:

l = []
for i in dict.keys:
    l.append(pd.DataFrame.from_dict(dict[i], orient='columns'))
df = pd.concat(l)
Sid
  • 3,749
  • 7
  • 29
  • 62
1

Not quite sure what you are trying to do, but do you mean something like this?

You can see inside the data by just printing the dataframe. Or you can print each one by the following code.

data = pandas.DataFrame(list_of_dicts)
print(data)

for i in data.loc[:, data.columns]:
    print(data[i])
Tim
  • 161
  • 9