0

I have a Dataframe with one column being dictionaries of different lengths.

It looks something like this

   Chrom   POS   N_Allels   dict

0      1   345   2010       {"A":0.1,"T":0.22,"G":0.01}
1      1   357   1989       {"T":0.9}
2      1   365   1850       {"A":0.3,"G":0.2}

I want to explode the dict into two columns, creating a new row for each entry, resulting in something that looks like this

   Chrom   POS   N_Allels   base   freq

0      1    345   2010       "A"    0.1
1      1    345   2010       "T"    0.22
2      1    345   2010       "G"    0.01
3      1    357   1989       "T"    0.9
4      1    365   1850       "A"    0.3
5      1    365   1850       "G"    0.2

Is there a good, clean way to do that?

I know there exists the df.explode() function, but it only creates a new row for each key of the dict, not the value.

Georg B
  • 181
  • 1
  • 1
  • 8
  • Here is the answer to you question: https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows – Supertech Mar 29 '22 at 13:19
  • They suggest the df.explode() method. As I mentioned this does not work, since it simply drops the respective value for the key. In general this method creates a row for each entry in the list/dict. But I want a conditional split into two columns, one for the key, one for the value – Georg B Mar 29 '22 at 13:34

1 Answers1

1

You could try the following:

df_result = (
    df
    .assign(dict=df.dict.map(lambda d: d.items()))
    .explode("dict")
    .assign(
        base=lambda df: df.dict.str.get(0),
        freq=lambda df: df.dict.str.get(1)
    )
    .drop(columns="dict")
    .reset_index(drop=True)
)

First replace the dictionaries with the resp. .items() iterators, then explode the dict column, and afterwards divide the tuples in two columns base (index 0) and freq (index 1).

Result for the sample:

   Chrom  POS  N_Allels base  freq
0      1  345      2010    A  0.10
1      1  345      2010    T  0.22
2      1  345      2010    G  0.01
3      1  357      1989    T  0.90
4      1  365      1850    A  0.30
5      1  365      1850    G  0.20
Timus
  • 10,974
  • 5
  • 14
  • 28
  • 1
    This works. As mentioned by @Georg B in his comment, the direct application of `explode` does not work. `explode` uses the default iteration over the dictionary which corresponds to iteration over keys, thus values are lost. This approach from @Timus works because `assign(dict=df.dict.map(lambda d: d.items()))` overrides the default iteration over the dictionary and makes both keys and values available for `explode`. – Peter Barmettler Jul 13 '23 at 10:23