How to deconstructed a labeled row of keywords and weights into columns for keywords

Question

I'm looking for a way to take a dictionary in r or python take an array of dictionaries from an excel workbook and transform it such that each column is a keyword and the rows have the associated weights. Note each cell has content filled keywords and associated weights following this format "name :: abc; weight :: 2.0; name :: def; weight :: 40.03".

[Input]

 ID | keywords  
 1  | (name : abc; weight : 2.0)  
 2  | (name : def; weight : 3.1; name : abc; weight : 32.1)  
 3  | (name : ghi; weight 3.0)

[Output]

 ID | abc | def | ghi   
 1  | 2.0 | 0   | 0  
 2  | 32.1| 3.1 | 0  
 3  | 0   | 0   | 3.0

So the key pieces of this is that each keyword is preceded by "name :: " and followed by "; weight :: " indicating the weight though there isn't a set limit to the number of characters for each keyword or the digits used for the weight. Also the keywords aren't all in the same order.

I've tried looking converting the string into a dictionary and then trying to use Joris's fix (from Splitting dictionary/list inside a Pandas Column into Separate Columns) to transform the array of dictionaries but to no avail.

I have also tried reformatting the source file to remove the name and weight label.

df_dict = df.to_dict()

df[:,1].apply(pd.Series)

However this returns typeError: '(slice(None, None, None), 1)' is an invalid key

And I am starting to believe that this solution won't work if I have a column of dictionaries to transform vs. the single row in the solution referenced above.

How to deconstructed a labeled row of keywords and weights into columns for keywords

0 Answers0