I'm new to Python and am still trying to tear myself away from C++ coding techniques while in Python, so please forgive me if this is a trivial question. I can't seem to find the most Pythonic way of doing this.
I have two lists of dicts. The individual dicts in both lists may contain nested dicts. (It's actually some Yelp data, if you're curious.) The first list of dicts contains entries like this:
{business_id': 'JwUE5GmEO-sH1FuwJgKBlQ',
'categories': ['Restaurants'],
'type': 'business'
...}
The second list of dicts contains entries like this:
{'business_id': 'vcNAWiLM4dR7D2nwwJ7nCA',
'date': '2010-03-22',
'review_id': 'RF6UnRTtG7tWMcrO2GEoAg',
'stars': 2,
'text': "This is a basic review",
...}
What I would like to do is extract all the entries in the second list that match specific categories in the first list. For example, if I'm interested in restaurants, I only want the entires in the second list where the business_id
matches the business_id
in the first list and the word Restaurants
appears in the list of values for categories
.
If I had these two lists as tables in SQL, I'd do a join on the business_id
attribute then just a simple filter to get the rows I want (where Restaurants
IN categories
, or something similar).
These two lists are extremely large, so I'm running into both efficiency and memory space issues. Before I go and shove all of this into a SQL database, can anyone give me some pointers? I've messed around with Pandas some, so I do have some limited experience with that. I was having trouble with the merge process.