2

I have a dataframe where one column contains several information in a 'key=value' format. There are almost a hundred different 'key=value' that can appear in that column but for simplicity sake I'll use this example with only 4 (_browser, _status, _city, tag)

id  name   properties
0   A      {_browser=Chrome, _status=TRUE, _city=Paris}
1   B      {_browser=null, _status=TRUE, _city=London, tag=XYZ}
2   C      {_status=FALSE, tag=ABC}

How can I convert this splitting the properties string column into multiple columns?

The expected output is:

id  name   _browser    _status    _city    tag
0   A      Chrome      TRUE       Paris       
1   B      null        TRUE       London   XYZ
2   C                  FALSE               ABC

Note: this value can also contain spaces (eg. _city=Rio de Janeiro)

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
eduardoftdo
  • 382
  • 3
  • 13

1 Answers1

5

Let's use str.findall with regex capture groups to extract key-value pairs from the properties column:

df.join(pd.DataFrame(
    [dict(l) for l in df.pop('properties').str.findall(r'(\w+)=([^,\}]+)')]))

Result:

 id name _browser _status   _city  tag
  0    A   Chrome    TRUE   Paris  NaN
  1    B     null    TRUE  London  XYZ
  2    C      NaN   FALSE     NaN  ABC
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53