0

I am working with a Yelp dataset which is a pandas dataframe. Each row contains information about a business and each column refers to a specific information like consumer ratings,categories, attributes, etc. I am specifically interested in extracting information from the attributes column of the dataframe. The attribute field in each row contains multiple elements and is of different lengths. I am attaching an image to make this clear. Image of attributes column

I would like to extract this information: 'RestaurantsPriceRange2: 1' from the attribute cell. Please note that the value for 'RestaurantsPriceRange2' also varies in each row and could be 1,2,3 or 4. I tried collecting each row into a list, but the length of the list varies for each row.

Can someone suggest how to pick the information that I want from the attributes column?

Rnovice
  • 333
  • 1
  • 5
  • 18

1 Answers1

0

It looks like the attributes are also separated by commas. You could split apart each of the attributes by commas not within braces (see How to split by commas that are not within parentheses? for a RegEx that you could use as a separator, and replace the parentheses with braces), and read the attributes and convert them into part of the dataframe as well. In this way, you could access the Restaurant Price Range directly.

victor
  • 1,573
  • 11
  • 23