I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column.
I have a 1mb JSON file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc.
There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries.
I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. I need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe.
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
and it returns something like this... I thought it might unpack the dictionary but it did not.
source.head()
_source
0 {'sub_organization_id': 'default', 'uid': 'aba...
1 {'sub_organization_id': 'default', 'uid': 'ab0...
2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
source.shape
(528, 1)
Following is a sample row of "_source". There are many dictionaries and key:value pairs where each key needs to be its own column.
{
'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {
'rule_matcher': [{
'atribs': {
'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'
},
'results': [{
'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {
'campaign_title': 'AF',
'project_title': 'AF '
}
}
]
}
],
'render': [{
'attribs': {
'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'
},
'results': [{
'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32
}
]
}
]
},
'norm_attribs': {
'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'
},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {
'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {
'uid': '2ab8f2651cb32261b911c990a8b'
},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'
},
'type': 'crawl',
'norm': {
'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'
}
}