This question is similar to this one pandas: split a string column into multiple columns and dynamically name columns. I modified the data as below
df = pd.DataFrame.from_dict({'study_id': {0: 'study1',
1: 'study2',
2: 'study3',
3: 'study4',
4: 'study5'},
'fuzzy_market': {0: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
1: '[Country: Germany], [Management experience: Yes]',
2: '[Country: United Kingdom], [Language: English]',
3: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
4: '[Age: 48-99]'}})
I would like the output to be as below.
study_id Age Country of Birth Country Language Management experience study1 18-67 Austria Austria German None study1 18-67 Germany Germany German None study2 None None Germany None Yes study3 None None United Kingdom English None
How should I modify these codes. Thank you.
p = df['fuzzy_market'].str.findall(r'([^:\[]+): ([^\]]+)')
df[['study_id']].join(pd.DataFrame(map(dict, p)))