I'm trying to split a column called "category" containing strings into two new columns "category" and "subcategory"
It's based on a kickstarter dataset we collected from webrobots.io. The "category" fields contain instances that look like this:
In: frame.category[1]
Out: {"id":325,"name":"Calendars","slug":"publishing/calendars","position":4,"parent_id":18,"color":14867664,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/publishing/calendars"}}}
I'm trying to get the first part of all rows after '"slug":"' before the slash (publishing) into a new column "category", and the part after slash before the quotation mark (calendars) into a new column "subcategory". I've tried with str.split and str.extract and presume that extract is what i need, but I'm very new to regular expressions so all my attempts have failed
This is what I've tried for now, it just gives me two columns both containing NaN all the way through
frame["category"].str.extract(r'(slug":")(/)')
It would be great if the result came out as two new columns with each one of the two words separated by the slash after "slug":"
Edit: Thanks to Nev1111's idea of treating the column as its own dataframe and joris on this thread I've come to the following code which works perfectly, although it might not be the best solution
#Assigning 'category' to its own dataframe and reading it as a dictionary with each key as its own column
df=frame['category'].map(eval).apply(pd.Series)
#splitting "slug" and creating new columns based on the category and subcategory
frame[['category','subcategory']]=df['slug'].str.split('/',expand=True)
When printing "frame" i get the two new columns with category and subcategory