1

I have a JSON file has deeply nested columns, I saw this post

https://stackoverflow.com/a/41168691/10718214

and I tried this code

 df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')

it works but it only takes the first nested, so for example, I have this column entities each row have like this:

{'hashtags': [{'text': 'ط±ظˆظٹظ‡_ط§ظ„ط¹ظ„ط§', 'indices': [65, 76]}], 'urls': [], 'user_mentions': [{'screen_name': 'a_albander', 'name': 'ط¹ط¨ط¯ط§ظ„ظ„ظ‡ ط§ظ„ط¨ظ†ط¯ط±', 'id': 248141082, 'id_str': '248141082', 'indices': [3, 14]}], 'symbols': [], 'media': [{'id': 1094650709386121218, 'id_str': '1094650709386121218', 'indices': [115, 138], 'additional_media_info': {'monetizable': False}, 'media_url': 'http://pbs.twimg.com/ext_tw_video_thumb/1094650709386121218/pu/img/W_V9kGPCPdgI3_G1.jpg', 'media_url_https': 'https://pbs.twimg.com/ext_tw_video_thumb/1094650709386121218/pu/img/W_V9kGPCPdgI3_G1.jpg', 'url': '', 'display_url': 'pic.twitter.com/iKMkqHCZbd', 'expanded_url': 'https://twitter.com/a_albander/status/1094651355287994369/video/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1200, 'h': 675, 'resize': 'fit'}, 'small': {'w': 680, 'h': 383, 'resize': 'fit'}, 'large': {'w': 1280, 'h': 720, 'resize': 'fit'}}, 'source_status_id': 1094651355287994369, 'source_status_id_str': '1094651355287994369', 'source_user_id': 248141082, 'source_user_id_str': '248141082'}]}

so when I try the code above I get each object in a separate column like this

entities.hashtags, entities.media, entities.symbols, entities.urls, entities.user_mentions ....etc.

but as you can see entities.hashtags have also nested columns did not split to a separate columns

[{'text': 'ط´طھط§ط،_ط·ظ†ط·ظˆط±ط©', 'indices': [89, 101]}]

how can I split them to be like this?

entities.hashtags.text ,entities.hashtags.indices

any help? thank you

vmf91
  • 1,897
  • 19
  • 27
Fatima
  • 497
  • 5
  • 21
  • I don't think you can achieve what you desire with your current JSON structure and `json_normalize`. Your `hashtags` property is a list of items, there is no way to relate multiple items to a single row in your DataFrame. If `hashtags` should actually be a single item, as in your example, it will probably work if you don't make it a list (`'hashtags': {'text': 'ط±ظˆظٹظ‡_ط§ظ„ط¹ظ„ط§', 'indices': [65, 76]}`). – hygorxaraujo Mar 10 '19 at 14:13
  • and how can do this is there any example? – Fatima Mar 11 '19 at 06:42

0 Answers0