I am working on a pandas dataframe, business, which was created by reading a csv file in Python. The csv file contains data from the website Yelp.com. One of the columns called attributes has a nested data structure. The length of rows in these columns vary. The following will hopefully make that clear:
business.attributes[0]
>"BikeParking: True,BusinessAcceptsBitcoin: False,BusinessAcceptsCreditCards: True,BusinessParking: {'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False},DogsAllowed: False,RestaurantsPriceRange2: 2,WheelchairAccessible: True"
business.attributes[1]
>'BusinessAcceptsBitcoin: False,BusinessAcceptsCreditCards: True'
business.attributes[2]
>"Alcohol: none,Ambience: {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': False},BikeParking: True,BusinessAcceptsCreditCards: True,BusinessParking: {'garage': False, 'street': False, 'validated': False, 'lot': False, 'valet': False},Caters: True,GoodForKids: True,GoodForMeal: {'dessert': False, 'latenight': False, 'lunch': False, 'dinner': False, 'breakfast': False, 'brunch': False},HasTV: True,NoiseLevel: quiet,OutdoorSeating: False,RestaurantsAttire: casual,RestaurantsDelivery: True,RestaurantsGoodForGroups: True,RestaurantsPriceRange2: 1,RestaurantsReservations: False,RestaurantsTableService: False,RestaurantsTakeOut: True,WiFi: free"
I would like to insert multiple columns in the business dataframe and capture values under each attribute in a separate column. Capturing each row in the attribute column gives me a string, but I am struggling to split it because of various types of delimiters and the nested structure. I know that there are 34 different attributes in the dataset with 6 of them having nested values. A sample of an expected output for third row in the above example is in the image below:
Any suggestions in Python or R are welcome to do this task. Please help!!