I am learning how to parse data and trying to create templates I can use for later by just changing the parameters of the loops and functions and methods of the desired code.
So I scraped the twitter api for hash tag related tweets and got back a list of nested dictionaries. I then saved the scraped data to a txt file and have been trying to clean the text and convert it to a table or rows. my problem when trying to create a table is locating headers, because the first line of the txt file has all the headers needed but there is a value next to each header and some values are dictionaries with key value pairs inside. most tutorials have sample files where the first line is clean title headers with no in betweens. but this is more complex and I thought, if I learn how to do this, I would be happy with moving on.
so here is the data sorry if its messy. I cleaned it in notepad by starting each new line with domain (did not know how to do this in python, would be a plus to know). so it starts with a square bracket indicating it is a list, then with in the list is 2 key value pairs and the value for those pairs are both dictionaries with 3-4 kv pairs inside.
all I need to do is convert all the keys to headers for the first line because the keys are the same for all lines in the txt file and then create a table from the headers and values.
[{'domain': {'id': '46', 'name': 'Business Taxonomy', 'description': 'Categories within Brand Verticals that narrow down the scope of Brands'}, 'entity': {'id': '1557696848252391426', 'name': 'Financial Services Business', 'description': 'Brands, companies, advertisers and every non-person handle with the profit intent related to Banks, Credit cards, Insurance, Investments, Stocks '}},
{'domain': {'id': '46', 'name': 'Business Taxonomy', 'description': 'Categories within Brand Verticals that narrow down the scope of Brands'}, 'entity': {'id': '1557697333571112960', 'name': 'Technology Business', 'description': 'Brands, companies, advertisers and every non-person handle with the profit intent related to softwares, apps, communication equipments, hardwares'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '47', 'name': 'Brand', 'description': 'Brands and Companies'}, 'entity': {'id': '1372588659346612225', 'name': 'Binance'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '857879456773357569', 'name': 'Technology', 'description': 'Technology'}},
{'domain': {'id': '66', 'name': 'Interests and Hobbies Category', 'description': 'A grouping of interests and hobbies entities, like Novelty Food or Destinations'}, 'entity': {'id': '913142676819648512', 'name': 'Cryptocurrencies', 'description': 'Cryptocurrency'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1001503516555337728', 'name': 'Blockchain', 'description': 'Blockchain'}},
{'domain': {'id': '66', 'name': 'Interests and Hobbies Category', 'description': 'A grouping of interests and hobbies entities, like Novelty Food or Destinations'}, 'entity': {'id': '1369311988040355840', 'name': 'NFTs', 'description': 'Non-fungible tokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '781974596148793345', 'name': 'Business & finance'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '781974596794716162', 'name': 'Financial services'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '847894353708068864', 'name': 'Investing', 'description': 'Investing'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '848920371311001600', 'name': 'Technology', 'description': 'Technology and computing'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '913142676819648512', 'name': 'Cryptocurrencies', 'description': 'Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1369311988040355840', 'name': 'NFTs', 'description': 'Non-fungible tokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1390680741206368263', 'name': 'Cryptocurrency exchanges'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1478776259068907541', 'name': 'Cryptotokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1484181943616884743', 'name': 'Cryptocoins'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1486271512655003652', 'name': 'Web3'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1491481998862348291', 'name': 'Digital asset industry'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1492162686204854274', 'name': 'Digital assets & cryptocurrency', 'description': 'Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1521397643909365760', 'name': 'NFT development'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1536439027636678656', 'name': 'Decentralized finance'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1478776259068907541', 'name': 'Cryptotokens'}}]
I tried this code. but the headers cannot be located this way.
import json
import re
import os
from tabulate import tabulate
file = open('binance_hash_tweets_micro.txt', 'r+')
read = file.readlines()
file.close()
modified = [] #this modified variable is a empty list that can be parsed into using loops that call modified
for row in read:
modified.append(row)
print(modified)
header = modified.pop(0)
def fixed_length(text,length):
if len(text) > length:
text = text[:length]
elif len(text) < length:
text = (text + " " * length) [:length]
return text
for column in header:
print(fixed_length(column,20), end = " ")
print()
If someone could help. I would appreciate. : )