0

This question is similar to this one pandas: split a string column into multiple columns and dynamically name columns. I modified the data as below

df = pd.DataFrame.from_dict({'study_id': {0: 'study1',
  1: 'study2',
  2: 'study3',
  3: 'study4',
  4: 'study5'},
 'fuzzy_market': {0: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
  1: '[Country: Germany], [Management experience: Yes]',
  2: '[Country: United Kingdom], [Language: English]',
  3: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
  4: '[Age: 48-99]'}})

I would like the output to be as below.

study_id    Age     Country of Birth    Country         Language      Management experience
study1     18-67    Austria             Austria         German        None
study1     18-67    Germany             Germany         German        None
study2     None     None                Germany         None          Yes
study3     None     None                United Kingdom  English       None

How should I modify these codes. Thank you.

p = df['fuzzy_market'].str.findall(r'([^:\[]+): ([^\]]+)')
df[['study_id']].join(pd.DataFrame(map(dict, p)))
Jason
  • 467
  • 2
  • 4
  • 12

1 Answers1

0

try this:

data = [*df.pop('fuzzy_market').str.findall(r'([^:\[]+): ([^\]]+)').map(dict)]
res = df.join(pd.DataFrame(data, index=df.index))
print(res)
ziying35
  • 1,190
  • 3
  • 6